Being bored earlier today I started thinking a bit about the relative performance of buffered and unbuffered byte streams in Java. As a simple test, I downloaded a reasonably large text file and wrote a short program to determine the effect that buffered streams has when copying the file. Four tests were performed:
- Copying the file using unbuffered input and output byte streams.
- Copying the file using a buffered input stream and an unbuffered output stream.
- Copying the file using an unbuffered input stream and a buffered output stream.
- Copying the file using buffered input and output streams.
Unsurprisingly, using buffered input and output streams is orders of magnitude faster than using unbuffered streams. However, the really interesting thing (to me at least) was the difference in speed between cases 2 and 3. Some sample results are as follows:
Unbuffered input, unbuffered output
Time: 36.602513585
Buffered input, unbuffered output
Time: 26.449306847
Unbuffered input, buffered output
Time: 6.673194184
Buffered input, buffered output
Time: 0.069888689
For those interested, the code is available here at Github. Can anyone shed any light on why the times for cases 2 and 3 are so asymmetric?
When you read a file, the filesystem and devices below it do various levels of caching. They almost never read one byte at at time; they read a block. On a subsequent read of the next byte, the block will be in cache and so will be much faster.
It stands to reason then that if your buffer size is the same size as your block size, buffering the input stream doesn’t actually gain you all that much (it saves a few system calls, but in terms of actual physical I/O it doesn’t save you too much).
When you write a file, the filesystem can’t cache for you because you haven’t given it a backlog of things to write. It could potentially buffer the output for you, but it has to make an educated guess at how often to flush the buffer. By buffering the output yourself, you let the device do much more work at once because you manually build up that backlog.