Implementing the Binary Memcached Protocol with OCaml and Bitstring
22 Aug 2014I’ve known OCaml for a while, but I never really put it to use outside of academic work back in university. After reading Real World Ocaml my excitement for Ocaml reappeared and I wanted to try out Core and Async. At the back of my mind, I also remembered bitstring, an awesome Ocaml library for creating and pattern matching on bitstrings. To cover all three, I decided to write a Memcached client using the binary protocol.
I’ll try to cover my experience in a number of blog posts. In this first post, I’ll be talking about using bitstring for parsing and constructing binary data for the Memcached protocol.
Constructing Requests and Parsing Responses
I started out by studying the Memcached binary protocol. The general format of a packet is a fixed size header followed by three optional variable size components (command-specific extras, the key and the value):
Byte/ 0 | 1 | 2 | 3 |
/ | | | |
|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
+---------------+---------------+---------------+---------------+
0/ HEADER /
/ /
/ /
/ /
+---------------+---------------+---------------+---------------+
24/ Command-specific extras (as needed) /
+/ (note length in the extras length header field) /
+---------------+---------------+---------------+---------------+
m/ Key (as needed) /
+/ (note length in key length header field) /
+---------------+---------------+---------------+---------------+
n/ Value (as needed) /
+/ (note length is total body length header field, minus /
+/ sum of the extras and key length body fields) /
+---------------+---------------+---------------+---------------+
The header is almost similar for requests and responses and can be described in the following way:
Request header:
Byte/ 0 | 1 | 2 | 3 |
/ | | | |
|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
+---------------+---------------+---------------+---------------+
0| Magic | Opcode | Key length |
+---------------+---------------+---------------+---------------+
4| Extras length | Data type | Reserved |
+---------------+---------------+---------------+---------------+
8| Total body length |
+---------------+---------------+---------------+---------------+
12| Opaque |
+---------------+---------------+---------------+---------------+
16| CAS |
| |
+---------------+---------------+---------------+---------------+
Total 24 bytes
Response header:
Byte/ 0 | 1 | 2 | 3 |
/ | | | |
|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
+---------------+---------------+---------------+---------------+
0| Magic | Opcode | Key Length |
+---------------+---------------+---------------+---------------+
4| Extras length | Data type | Status |
+---------------+---------------+---------------+---------------+
8| Total body length |
+---------------+---------------+---------------+---------------+
12| Opaque |
+---------------+---------------+---------------+---------------+
16| CAS |
| |
+---------------+---------------+---------------+---------------+
Total 24 bytes
Let’s go through the header fields one by one:
Magic
is a magic byte which distinguishes requests from responses. It must be0x80
for requests and0x81
for responses.Opcode
specifies the type of command, e.g.GET
is0
andSET
is1
.Key Length
specifies the size in bytes of theKey
-part of the packet.Extras length
specifies the size in bytes of theCommand-specific extras
-part of the packet.Data type
is currenly not used and should always be0
.- Bytes 7-8 is either
Reserved
in the request orStatus
in the response. The status corresponds to an error code or0
if successful. Total body length
is the total size of the body in bytes, i.e.Total body length = Key length + Key length + Value length
. The value length is not given explicitly in the header, but can be calculated asTotal body length - Key length - Extras length
.Opaque
is 4 bytes which are passed along in a request and passed back unmodified in the response.CAS
is used for data version checking. When storing a value you can optionally set its CAS value. Future modifications will be rejected if a matching CAS value is not provided.
As you can see, the only real difference between a request and a response header is that bytes 7-8 are reserved in the request header and used for status in the response header. Ignoring this difference, we can define a unified type for Memcached headers:
I’ve omitted data type, since it’s currently not used as noted above.
Now that we have a header type, we just need to figure out how to marshal and unmarshal it from binary data. We’ll use the bitstring library for this purpose. It provides a very convenient way to construct binary data and pattern match on binary data, reminiscent of bit syntax in Erlang. Given a header, we can construct a bitstring like so:
BITSTRING { ... }
will return a value of type bitstring
. Each line inside the braces has the form value : size
. The first line thus says the first 8 bits should hold the value 0x80 (the magic byte for requests).
Given a bitstring we can similarly parse it to a header in the following manner using the bitmatch
operator:
bitmatch bits with [patterns]
will match bits
of type bitstring
against each pattern, which follows the same syntax as for BITSTRING
. As you can tell, it’s much simpler to build and parse binary strings with bitstring compared to manual bit fiddling!
Both of these functions work with the type bitstring
defined in the bitstring library. So how do we read a bitstring from IO or write a bitstring to IO? The bitstring library defines a number of functions which interact with channels (channels are the IO abstraction in the standard library):
Given those, it should be simple to parse a header from a channel:
At this point we can read and write Memcached headers to and from channels. To handle complete Memcached packets, we need to consider the variable sized body too. Let’s define a packet type:
The key and value are strings, while the extras contain binary data (a bitstring). We’re now ready to read and write entire packets from channels. Here’s the code:
As a trivial example, we can now send a GET
request for the key foo
to a Memcached server running on localhost port 11211:
Obvious this is still a very low level interface, but it’s easy to build on top of the building blocks we’ve implements. Defining functions with a simpler signature to get and set keys should be easy, but I’ll leave that as an exercise for the reader.
Next time…
In this blog post we’ve developed a primitive OCaml library for talking to Memcached with the binary protocol using bitstring. The code is based on the standard library, which is honest not that great, and all IO is synchronous. In the next installment we’ll try using Core and Async for asynchronous IO. Stay tuned!