I’m trying to write an application which uses Google’s protocol buffers to deserialize data (sent from another application using protocol buffers) over a TCP connection. The problem is that it looks as if protocol buffers in Python can only deserialize data from a string. Since TCP doesn’t have well-defined message boundaries and one of the messages I’m trying to receive has a repeated field, I won’t know how much data to try and receive before finally passing the string to be deserialized.
Are there any good practices for doing this in Python?
Don’t just write the serialized data to the socket. First send a fixed-size field containing the length of the serialized object.
The sending side is roughly:
And the recv’ing side becomes something like:
This is a common design pattern for socket programming. Most designs extend the over-the-wire structure to include a type field as well, so your receiving side becomes something like:
You end up with an over-the-wire message format that looks like:
This does a reasonable job of future-proofing the wire protocol against unforeseen requirements. It’s a Type-Length-Value protocol, which you’ll find again and again and again in network protocols.