I was generally considering writing a protocol for my own purposes over TCP and hit the problem of determining the end of the message.
Deriving from HTTP, I could see that the message length is mentioned in Content-length and could be the method of determining that the message has been completely received. Is that the only way to do this ? What will happen if this header is missing, since AFAIK the headers are optional in a HTTP message.
Thanks.
There’s other ways, e.g.
Have a special delimiter mark the end (and perhaps the start of the next message). e.g. you could end all the messages with a newline, so to read a message, you read everything up till a newline. You’ll need to ensure the message content does not contain a newline itself in this case, by e.g. escaping newlines in the message when you send and un-escaping as you read. Or encode the messages with an encoding that does not contain newlines (e.g. base64, or ascii-hex)
Format the messages so it contains a structure that a parser will detect the start and end a of message implicittly. e.g. if you send json, you’ll have to parse the json as you receive data, and once all the
{and[characters have been matched, you got full a message.prefix each message with a length. This is quite like the “Content-Length” in HTTP, but instead you encode the length in binary, e.g. the first 4 bytes of each message holds the length of the data that follows.
That’s more tricky, as there’s many cases dealing with this.
e.g. for HTTP/1.0, there must be a Content-Length in a request if there actually is a body. For HTTP/1.1, a request might contain certain content encodings (e.g. chunked encoding) where the length can be parsed out in the message body.
For http responses, the end of the message can be indicated by closing the connection, and everything from the end of the headers to the end of the stream will be regarded as a “message”. Some more info here