I want to send ‘packets’ of data (i.e. discrete messages) between two programs through named pipes. Given that I have to supply a buffer and a buffer size to read, and given that the read command is blocking (I believe), I either have to have a buffer size that guarantees I never get an under-run, or to know the size of the message up-front. I don’t want the sending program to have to know the size of the buffer and pad it out.
As I see it, there are three ways to do this.
- Prepend each package with the size of the message being sent so the listening program can read that many bytes.
- Read from the pipe a byte at a time and listen for a special end-of-stream value.
- A better way
In the first case I would be able to create a buffer of known size and read into it at once. In the second case I would have to read with a one-byte buffer. This might either be perfectly OK or a massively inefficient travesty.
The only reason I would go for the second approach would be for more flexible input (for example, manual interaction if I wanted it).
Which is the best way to go?
With named pipes, reads and writes are (or can be) atomic. Within limits, if you write, say, 1024 bytes to the pipe, a read call on the other end that is looking for at least 1024 bytes will actually receive the 1024 bytes, even if there is more data in the pipe at the time of the read. Further, and always, if there are just 1024 bytes in the named pipe and a read requests 4096 bytes, it will get the 1024 bytes on the first attempt, and only block on a subsequent attempt.
You say:
You do…
It is, unless you set O_NONBLOCK on the file descriptor…
What sort of messages are you sending? What size are you dealing with? Kilobytes, megabytes, bigger?
There is no particular problem with having, say, a 4KB buffer in the reader, and reading the message in chunks. The issue is knowing when you reach the end of the message. By far the majority of protocols require the length up front, because it makes it easy to write the reader code reliably.
If you are going to do an ‘end of stream’ (EOS) marker, you are doing ‘in-band signalling’. And that causes trouble. What character are you going to use? What happens when that character appears in the data? You need an escape mechanism, such as a character that means ‘the next character is not the EOS marker’. For example, in text related to programming, the backslash is used for this. At a terminal, control-V often serves the purpose.
Why is it hard for the sender to know the size of the buffer? And why would it need to ‘pad it out’?
If you are dealing with large amounts of data (from say kilobytes upwards), the single-character solution is unlikely to yield acceptable performance. I think you would be best off having the sender able to determine the size of packet and telling the reader, or designing the protocol so that there are limits on the size of a packet. If you need to convey arbitrary amounts of data, have a protocol which says:
Also consider what will happen in future if, instead of using named pipes, you want to upgrade your system to work over a socket connection to another machine.
I think you should design your system with packets where the packet headers include the size of the data (the way most networking protocols, such as TCP/IP, do things). And if there’s a higher level flow of data of unknown size, handle it along the lines outlined above. But even there, it is better if you can tell the overall size ahead of time.