I have a program written in Java that reads in a file that is simply a list of strings into a LinkedHashMap. Then it takes a second file which consists of two columns and for each row see if the right-hand term matches one of the terms from the HashMap. The problem is it’s running very slow.
Here’s a code snippet, this is where it compares the second file to the HashMap terms:
String output = "";
infile = new File("2columns.txt");
try {
in = new BufferedReader(new FileReader(infile));
} catch (FileNotFoundException e2) {
System.out.println("2columns.txt" + " not found");
}
try {
fw = new FileWriter("newfile.txt");
out = new PrintWriter(fw);
try {
String str = in.readLine();
while (str != null) {
StringTokenizer strtok = new StringTokenizer(str);
strtok.nextToken();
String strDest = strtok.nextToken();
System.out.println("Term = " + strDest);
//if (uniqList.contains(strDest)) {
if (uniqMap.get(strDest) != null) {
output += str + "\r\n";
System.out.println("Matched! Added: " + str);
}
str = in.readLine();
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
out.print(output);
I got a performance boost from switching from an ArrayList initially to the LinkedHashMap but it’s still taking a long time. What can I do to speed this up?
Your major bottleneck may be that you are recreating a StringTokenizer for every iteration of the while loop. Moving this outside the loop could help considerably. Minor speed ups can be obtained by moving the String definition outside the while loop.
The biggest speedup will probably come from using a StreamTokenizer. See below for an example.
Oh and use a HashMap instead of a LinkedHashMap as @Doug Ayers says in the above comments 🙂
And @ MДΓΓ БДLL’s suggestion of profiling your code is bang on. checkout this Eclipse Profiling Example
One final thought is (and I’m not confident on this one) that writing to a file may also be faster if you do it in one go at the end. i.e. store all your matches in a buffer of some sort and do the writing in one hit.