What can I use to perform fast serialization of data into several files in C++11 (to avoid data redundancy I assume that I will split data into several tables and join them on its id numbers)?
I think about using:
- simple binary files accessed with
fstream.read(),fstream.write(). - using
mmap. - function google
protobuf(if I can access random element instead of iterating on all of them).
All tables will consist of columns with following datatypes:
uint8, uint16, uint32, uint64, string.
Fast random access will be the challenge here. The easiest way to achieve that is to keep each row constant size. There’s no easy way to do that using
protobufs, unless you assume a conservative maximum size. It should be relatively easy to do this with either of your first two options (assuming you have a reasonable limit on the size of the string).You can get arbitrarily more complicated, however. Using
protobufs will likely use less space than a naive serialization, so you’ll have memory left over to build an index. Even a relatively small index (say, mapping from a table row number to a file offset for every 100th row) will give you fast random access and use a lot less space. Of course, this is quite a bit more complicated than the simple every-row-is-the-same-size approach.