I am needing to store easily parsable data in a file as an alternative to the database backed solution (not up for debate). Since its going to be storing lots of data, preferably it would be a lightweight syntax. This does not necessarily need to be human readable, but should be parsable. Note that there are going to be multiple types of fields/columns, some of which might be used and some of which won’t
From my limited experience without a database I see several options, all with issues
- CSV – I could technically do this, and it is very light. However the parsing would be an issue, and then it would suck if I wanted to add a column. Multi-language support is iffy, mainly people’s own custom parsers
- XML – This is the perfect solution from many fronts except when it comes to parsing and overhead. Thats a lot of tags and would generate a giant file, and parsing would be very resource consuming. However virtually every language supports XML
- JSON – This is the middle ground, but I don’t really want to do this as its an awkward syntax and parsing is non-trivial. Language support is iffy.
So all have their disadvantages. But what would be the best when trying to aim for language support AND somewhat small file size?
If you’re just using the basics of all these formats, all of the parsers are trivial. If CSV is an option, then for XML and JSON you’re talking blocks of name/value pairs, so there’s not even a recursive structure involved. json.org has support for pretty much any language.
That said.
I don’t see what the problem is with CSV. If people write bad parsers, then too bad. If you’re concerned about compatibility, adopt the default CSV model from Excel. Anyone that can’t parse CSV from Excel isn’t going to get far in this world. The weakest support you find in CSV is embedded newlines and carriage returns. If you data doesn’t have this, then it’s not a problem. Only other issue is embedded quotations, and those are escaped in CSV. If you don’t have those either, then its even more trivial.
As for “adding a column”, you have that problem with all of these. If you add a column, you get to rewrite the entire file. I don’t see this being a big issue either.
If space is your concern, CSV is the most compact, followed by JSON, followed by XML. None of the resulting files can be easily updated. They pretty much all would need to be rewritten for any change in the data. CSV has the advantage that it’s easily appended to, as there’s no closing element (like JSON and XML).