In the memory based computing model, the only running time calculations that need to be done can be done abstractly, by considering the data structure.
However , there aren’t alot of docs on high performance disk I/o algorithms. Thus I ask the following set of questions:
1) How can we estimate running time of disk I/o operations? I assume there is a simple set of constants which we might add for looking up a value on disk, rather than in memory…
2) And more specifically, what is the difference between performance for accessing a specific index in a file? Is this a constant time operation? Or does it depend on how “far down” the index is?
3) Finally… how does the JVM optimize access of indexed portions of a file?
And… as far as resources — in general… Are there any good idioms or libraries for on disk data structure implementations?
In chapter 6 of Computer Systems: A Programmer’s Perspective they give a pretty practical mathematical model for how long it takes to read some data from a typical magnetic disk.
To quote the last page in the linked pdf:
*note, the linked pdf is from the authors website == no piracy
Of course, if the data being accessed was recently accessed, there’s a decent chance it’s cached somewhere in the memory heiarchy, in which case the access time is extremely small(practically, “near instant” when compared to disk access time).
Another seek + rotation amount of time may occur if the seeked location isnt stored sequentially nearby. It depends where in the file you’re seeking, and where that data is physically stored on the disk. For example, fragmented files are guaranteed to cause disk seeks to read the entire file.
Something to keep in mind is that even though you may only request to read a few bytes, the physical reads tend to occur in multiples of a fixed size chunks(the sector size), which ends up in cache. So you may later do a seek to some nearby location in the file, and get lucky that its already in cache for you.
Btw- The full chapter in that book on the memory hierarchy is pure gold, if you’re interested in the subject.