I have an algorithm that performs some file I/O (reading, writing) and computation.
If I write to tape (not read), the algorithm works great. If I read from tape (no writing), the performance is poor. If tape is taken out of the equation (just disk for I/O), then it works great.
Now, I’ve boiled it down to a relatively simple case that I’m trying to understand.
The setup is a single, 20 GB file on tape. I am reading this file in blocks, sequentially.
The test algorithm is something like:
while (fileRemaining)
{
ReadBlock(blockSize);
Sleep(sleepTime); // this is to mimic computation time
}
Some observations:
- When using a blockSize of 8K, and sleepTime of 0, the throughput (data read/second) is good. Further, the tape drive is constantly making noise.
- When using a blockSize of 8K, and any non-zero sleepTime (even 1ms), the throughput suffers horribly. Data still gets read, but the tape drive does not regularly make noise. It becomes silent for a while with occasional noises.
- When using a blockSize of 2M, and a sleepTime of 100ms, the throughput is good. The tape drive makes noise the entire time (although, it audibly sounds like a slower speed?).
- Windows Explorer is able to transfer the file from tape to disk with good throughput.
How do I get good read performance here?
If you would be so kind to help me understand the other mysteries as well — Why does the presence of a Sleep throw off the throughput so significantly (knowing this could help re-think the algorithm)? What’s the “optimal” amount to read from tape at a time? Is the noise coming from the tape drive even relevant to notice?
You haven’t given any details of the tape media, drive or interface type the drive is using.
Current technology like LTO4/5 is capable of delivering data at around 240 – 280MB/s. Performance is achieved by reading in an optimum block size for LTO I believe this is 64KB. Block sizes up to 256KB do not impact significantly but reading lots of small blocks will. Read/Write in bigger blocks and split the data up within your program once you’ve read it in. If the data is already on the tape in 8KB blocks then set the drive into fixed block mode and read multiple 8KB blocks.
Tape drives have to reach a specific motional speed to read data. If the data is not streamed from the drive fast enough then the drive will have to slow down, stop , rewind , reposition , get back up to speed and then start reading again. This stop / starting will have a significant impact on performance. LTO tries to compensate for this by being able to read at different tape speeds but there are limits.
Further speed improvements can be achieved using asynchronous I/O, however I don’t believe this isn’t necessary for this application.