Apart from PyYAML, are there any safe Python data serialization libraries which correctly handle unicode/str?
For example:
>>> json.loads(json.dumps([u"x", "x"]))
[u'x', u'x'] # Both unicode
>>> msgpack.loads(msgpack.dumps([u"x", "x"]))
['x', 'x'] # Neither are unicode
>>> bson.loads(bson.dumps({"x": [u"x", "x"]}))
{u'x': [u'x', 'x']} # Dict keys become unicode
>>> pyamf.decode(pyamf.encode([u"x", "x"])).next()
[u'x', u'x'] # Both are unicode
Note that I want the serializers to be safe (so pickle and marshel are out), and PyYAML is an option, but I dislike the complexity of YAML, so I’d like to know if there are other options.
Edit: it appears that there is some confusion about the nature of my data. Some of them are Unicode (ex, names) and some of them are binary (ex, images)… So a serialization library which confuses unicode and str is just as useless to me as a library which confuses "42" and 42.
Have you tried bert?
(to install, you’ll have to manually install erlastic first, because of this outstanding pull request)