I have a set of classes I wish to serialize the data from. There is a lot of data though, (we’re talking a std::map with up to a million or more class instances).
Not wishing to optimize my code too early, I thought I’d try a simple and clean XML implementation, so I used tinyXML to save the data out to XML, but it was just far too slow. So I’ve started looking at using Boost.Serialization writing and reading standard ascii or binary.
It seems to be much better suited to the task as I don’t have to allocate all this memory as an overhead before I get started.
My question is essentially how to go about planning an optimal serialization strategy for a file format. I don’t particularly want to serialize the whole map if it’s not necessary, as it’s really only the contents I’m after. Having played around with serialization a little (and looked at the output), I don’t understand how loading the data back in could know when it’s reached the end of the map for example, if I simply save out all the items one after another. What issues do you need to consider when planning a serialization strategy?
Thanks.
There are many advantages to boost.serialization. For instance, as you say, just including a method with a specified signature, allows the framework to serialize and deserialize your data. Also, boost.serialization includes serializers and readers for all the standard STL containers, so you don’t have to bother if all keys have been stored (they will) or how to detect the last entry in the map when deserializing (it will be detected automatically).
There are, however, some considerations to make. For example, if you have a field in your class that it is calculated, or used to speed-up, such as indexes or hash tables, you don’t have to store these, but you have to take into account that you have to reconstruct these structures from the data read from the disk.
As for the ‘file format’ you mention, I think some times we try to focus in the format rather than in the data. I mean, the exact format of the file don’t matter as long as you are able to retrieve the data seamlessly using (say) boost.serialization. If you want to share the file with other utilities that don’t use serialization, that’s another thing. But just for the purposes of (de)serialization, you don’t have to care about the internal file format.