I’m writing a script in Python that will spit out some data organized as a list of dicts:
[{'name': 'first_thing', 'color': 'blue', 'flavour': 'watermelon' },
{'name': 'second_thing', 'color': 'red' },
{'name': 'third_thing', 'color': 'blue', 'size': 'huge!' }]
I am trying to decide on a way to store this data in a file. My considerations:
- I’d like it to be as easy to read as to write, so I can load the data back into a script and manipulate it further.
- I’d like it to be a non-python-specific format. Maybe later I’ll want to use this data in PHP or something, who knows?
- I’d like it to be a format to which it is easy to append more data. If my file has a list with 1000 of my little dict items in it, I do not want to load all 1000 into memory just to add one more item to the end.
My first try was to use Pickle, which meets the easy criteria, but it’s Python-dependent and i’d have to unpickle, append, then repickle.
Other formats I’ve thought of that seem feasible (with my objections):
- JSON (appending is going to be annoying, maybe)
- Shelve (python specific)
- CSV (like duct tape, not so classy, but it would probably work)
- Some kind of light database like sqlite (maybe getting too fancy here)
Anybody have arguments for any of these or another format?
Given your need to append data later, YAML might be the format you’re looking for. It’s designed explicitly to support appended data elements ala a log file, json is deliberately a proper subset of the language, and it has some useful meta markup designed for powerful cross-language serialization of custom classes.