I need to read ~50 files on every server start and place each text file’s representation into memory. Each text file will have its own string (which is the best type to use for the string holder?).
What is the fastest way to read the files into memory, and what is the best data structure/type to hold the text in so that I can manipulate it in memory (search and replace mainly)?
Thanks
A memory mapped file will be fastest… something like this:
and then proceed to read from the byte buffer.
This will be significantly faster than
FileInputStreamorFileReader.EDIT:
After a bit of investigation with this it turns out that, depending on your OS, you might be better off using a new
BufferedInputStream(new FileInputStream(file))instead. However reading the whole thing all at once into a char[] the size of the file sounds like the worst way.So
BufferedInputStreamshould give roughly consistent performance on all platforms, while the memory mapped file may be slow or fast depending on the underlying OS. As with everything that is performance critical you should test your code and see what works best.EDIT:
Ok here are some tests (the first one is done twice to get the files into the disk cache).
I ran it on the rt.jar class files, extracted to the hard drive, this is under Windows 7 beta x64. That is 16784 files with a total of 94,706,637 bytes.
First the results…
(remember the first is repeated to get the disk cache setup)
ArrayTest
ArrayTest
DataInputByteAtATime
DataInputReadFully
MemoryMapped
Here is the code…