My Python application currently uses the python-memcached API to set and get objects in memcached. This API uses Python’s native pickle module to serialize and de-serialize Python objects.
This API makes it simple and fast to store nested Python lists, dictionaries and tuples in memcached, and reading these objects back into the application is completely transparent — it just works.
But I don’t want to be limited to using Python exclusively, and if all the memcached objects are serialized with pickle, then clients written in other languages won’t work.
Here are the cross-platform serialization options I’ve considered:
- XML – the main benefit is that it’s human-readable, but that’s not important in this application. XML also takes a lot space, and it’s expensive to parse.
- JSON – seems like a good cross-platform standard, but I’m not sure it retains the character of object types when read back from memcached. For example, according to this post tuples are transformed into lists when using simplejson; also, it seems like adding elements to the JSON structure could break code written to the old structure
- Google Protocol Buffers – I’m really interested in this because it seems very fast and compact — at least 10 times smaller and faster than XML; it’s not human-readable, but that’s not important for this app; and it seems designed to support growing the structure without breaking old code
Considering the priorities for this app, what’s the ideal object serialization method for memcached?
- Cross-platform support (Python, Java, C#, C++, Ruby, Perl)
- Handling nested data structures
- Fast serialization/de-serialization
- Minimum memory footprint
- Flexibility to change structure without breaking old code
I tried several methods and settled on compressed JSON as the best balance between speed and memory footprint. Python’s native Pickle function is slightly faster, but the resulting objects can’t be used with non-Python clients.
I’m seeing 3:1 compression so all the data fits in memcache and the app gets sub-10ms response times including page rendering.
Here’s a comparison of JSON, Thrift, Protocol Buffers and YAML, with and without compression:
http://bouncybouncy.net/ramblings/posts/more_on_json_vs_thrift_and_protocol_buffers/
Looks like this test got the same results I did with compressed JSON. Since I don’t need to pre-define each structure, this seems like the fastest and smallest cross-platform answer.