I already read this question about socket synchronization but I still dont get it yet.
Recently I was working on a relatively simple client/server app where the communication happens over a tcp socket. The client is written in PHP using the C-like functions (especially fsockopen and fgetc) PHP provides to interact with sockets, the server is written in node.js using a Stream for outputting data.
The protocol is quite simple, the message is just a string which ends with a 0-byte character.
Basically it works like this:
SERVER: Message 1
CLIENT: Ack 1
SERVER: Message 2
CLIENT: Ack 2
....
Which really worked fine as my client processed one message at a time by reading char by char from the socket until a 0-byte was encountered which designates the end of the message. Then the client writes back to the server that it has successfully received the message (thats the Ack <message id> part).
Now this happened:
SERVER: Message 1
CLIENT: Ack 1
SERVER: Message 2
CLIENT: Ack 2
SERVER: Message 3
Message 4
Message 5
Message 6
CLIENT: <DOH!>
....
Meaning the server unexpectedly sent multiple messages in one “batch” to the client, although every message is a single stream.write(...) operation on the server. It seemed like the messages were buffered somewhere and then sent to the client at once. My client code couldnt cope with multiple messages in the socket WITHOUT an Ack response in between, so it cut off the remaining messages after id 3.
So my question is:
- How synchronized are sockets in their read and writes? From the question above I understand that a socket is basically two uni-directional pipes, which means they are not synchronized at all?
- How can it happen that some messages were sent to my client in a simple “one message-one ack” manner and then suddendly multiple messages are written to the stream?
- Does it actually change the picture if the socket is opened in a blocking/non-blocking manner?
I tested this on a Ubuntu VM (so no load or anything that could provoke strange behaviour) using PHP 5.4 and node 0.6.x.
TCP is an abstraction of a bi-directional stream, and as such has no concept of messages and cannot preserve message boundaries. There is no guarantee how multiple send() or recv() calls will map to TCP packets. You should treat send() as if calling it multiple times is equivalent to calling it once with the concatenation of all the data. More importantly, when receiving, you should make sure that your code interprets the incoming data exactly the same way, no matter how it was split over indvidual recv() calls.
To receive properly, you can use a buffer where you store incomplete messages. But be careful that when you have an incomplete message in a buffer, the next recv() call may complete the current message, as well as provide zero or more complete messages, and possibly part of another incomplete message.
The blocking or non-blocking mode doesn’t change anything here – it’s only about the way your application interfaces with the OS.