Using Google protobuf, I am saving my serialized messaged data to a file – in each file there are several messages. We have both C++ and Python versions of the code, so I need to use protobuf functions that are available in both languages. I have experimented with using SerializeToArray and SerializeAsString and there seems to be the following unfortunate conditions:
-
SerializeToArray: As suggested in one answer, the best way to use this is to prefix each message with it’s data size. This would work great for C++, but in Python it doesn’t look like this is possible – am I wrong?
-
SerializeAsString: This generates a serialized string equivalent to it’s binary counterpart – which I can save to a file, but what happens if one of the characters in the serialization result is \n – how do we find line endings, or the ending of messages for that matter?
Update:
Please allow me to rephrase slightly. As I understand it, I cannot write binary data in C++ because then our Python application cannot read the data, since it can only parse string serialized messages. Should I then instead use SerializeAsString in both C++ and Python? If yes, then is it best practice to store such data in a text file rather than a binary file? My gut feeling is binary, but as you can see this doesn’t look like an option.
The best practice for concatenating messages in this way is to prepend each message with its size. That way you read in the size (try a 32bit int or something), then read that number of bytes into a buffer and deserialize it. Then read the next size, etc. etc.
The same goes for writing, you first write out the size of the message, then the message itself.
See Streaming Multiple Messages on the protobuf documentation for more information.