I am looking to read the contents of a file in Java. I have about 8000 files to read the contents and have it in HashMap like (path,contents). I think using Threads would be a option for doing this to speed up the process.
From what I know having all 8000 files to read their contents in different threads is not possible(we may want to limit the threads),Any comments on it? Also I am new to threading in Java, can any one help on how to get started on this one?
so far I thought this pesudo code, :
public class ThreadingTest extends Thread {
public HashMap<String, String > contents = new HashMap<String, String>();
public ThreadingTest(ArrayList<String> paths)
{
for(String s : paths)
{
// paths is paths to files.
// Have threading here for each path going to get contents from a
// file
//Not sure how to limit and start threads here
readFile(s);
Thread t = new Thread();
t.start();
}
}
public String readFile(String path) throws IOException
{
FileReader reader = new FileReader(path);
StringBuilder sb = new StringBuilder();
BufferedReader br = new BufferedReader(reader);
String line;
while ( (line=br.readLine()) != null) {
sb.append(line);
}
return textOnly;
}
}
Any help in completing the threading process. Thanks
Short answer: Read the files sequentially. Disk I/O doesn’t parallelize well.
Long Answer: Threading might improve the read performance if the disks are good at random access (SSD disks are) or if the files are placed on several different disks, but if they’re not you’re just likely to end up with a lot of cache misses and waiting for the disks to seek the right read position. (You may still end up there even if your disks are good at random access.)
If you want to measure instead of guess, use
Executors.newFixedThreadPoolto create anExecutorServicewhich can read your files in parallell. Experiment with different thread counts, but don’t be surprised if one reader thread per physical disk gives you the best performance.