Background:
I’m using Google’s protobuf, and I would like to read/write several gigabytes of protobuf marshalled data to a file using C++. As it’s recommended to keep the size of each protobuf object under 1MB, I figured a binary stream (illustrated below) written to a file would work. Each offset contains the number of bytes to the next offset until the end of the file is reached. This way, each protobuf can stay under 1MB, and I can glob them together to my heart’s content.
[int32 offset]
[protobuf blob 1]
[int32 offset]
[protobuf blob 2]
...
[eof]
I have an implemntation that works on Github:
src/glob.hpp
src/glob.cpp
test/readglob.cpp
test/writeglob.cpp
But I feel I have written some poor code, and would appreciate some advice on how to improve it. Thus,
Questions:
- I’m using
reinterpret_cast<char*>to read/write the 32 bit integers to and from the binaryfstream. Since I’m using protobuf, I’m making the assumption that all machines are little-endian. I also assert that anintis indeed 4 bytes. Is there a better way to read/write a 32 bit integer to a binaryfstreamgiven these two limiting assumptions? - In reading from
fstream, I create a temporary fixed-lengthcharbuffer, so that I can then pass this fixed-length buffer to the protobuf library to decode usingParseFromArray, asParseFromIstreamwill consume the entire stream. I’d really prefer just to tell the library to read at most the nextNbytes fromfstream, but there doesn’t seem to be that functionality in protobuf. What would be the most idiomatic way to pass a function at most N bytes of anfstream? Or is my design sufficiently upside down that I should consider a different approach entirely?
Edit:
- @codymanix: I’m casting to
charsinceistream::readrequires achararray if I’m not mistaken. I’m also not using the extraction operator>>since I read it was poor form to use with binary streams. Or is this last piece of advice bogus? - @Martin York: Removed
new/deletein favor ofstd::vector<char>.glob.cppis now updated. Thanks!
Don’t use new []/delete[].
Instead us a std::vector as deallocation is guaranteed in the event of exceptions.
Don’t assume that reading will return all the bytes you requested.
Check with gcount() to make sure that you got what you asked for.
Rather than have Glob implement the code for both input and output depending on a switch in the constructor. Rather implement two specialized classes like ifstream/ofstream. This will simplify both the interface and the usage.