I have an object defined as :
std::unordered_map<std::string, std::vector<int>> large_obj;
It can store very large amount of data (containing many rows) and it works pretty well.
However, I want to backup large_obj into a file, and in some time later, I will load it again from file to another object.
What is the most optimized method to read/write this object instead of writing each row in a file in a pre-defined structure?
Keeping in mind that:
- Multiple calls to read() are more expensive than single call
- Binary files are faster than text files.
Which save time for object I/O?
I’d not really care about the overhead of
read()andwrite()— just use a buffered stream and a data format that can be read and written without skipping back and forth in the data stream.The serialized stream you are writing out should be close enough to your data representation so you can take over large chunks of the data by simple copy, but still abstract enough to allow reconstruction from older versions of the data format or on machines with differing internal representations.
I usually define a header that includes a magic number, a data format version and a set of values that capture the machine specific parts. For your case, that would be
When reading back data, you compare the values in the header against what you’d expect — if a case appears where this does not match, you can add deserialization code that handles this.
For the rows, I’d use an encoding like
This can be saved and restored efficiently. When your requirements change, just increment the version number, define a new format and adapt the parser code to create the new in-memory representation.