I’ve switched in my application from a single to a multi threaded routine.
This works pretty fine in the JUnit tests. When running it with 10 threads, the test needs 195 ms to complete and when running it with only one thread the application takes 406 ms to finish. So there clearly is a performance advantage.
But when running it on the server, the application now needs much longer than when it was only single threaded.
Basically, my application reads a line in a csv file, puts one of its value in a set and prints the line to another file.
The size of the input file in the JUnit tests is about 35 lines long, the one on the server about 6 000 000 lines long.
The set in which those values are put is a synchronized HashSet which can contain Long objects.
I’m monitoring my application with the Java VisualVM but unfortunately I don’t know what to look for.
Do you have any hints for me on how to solve this performance crisis?
P. S.: Most of the time my threads are marked as waiting, but I don’t know if they are really waiting or if they are just too fast for the Java VisualVM to display it.
To further clarify my routine: I read the file single threaded, but as soon as the line is read I pass the resulting object to a Runnable that puts it into a set and prints it into another file. Meanwhile the next lines are read and passed to other threads.
As I can see it in my log file, the threads are doing something and aren’t just waiting. But there are certain jumps, periods longer than 100 ms where nothing is happening.
One of those jumps:
2011-04-08 12:27:16,580 DEBUG [Thread-10] runnables.Runner - 7070927
2011-04-08 12:27:16,580 DEBUG [Thread-10] runnables.Runner - 9058759
2011-04-08 12:27:16,580 DEBUG [Thread-10] runnables.Runner - 7030928
2011-04-08 12:27:16,580 DEBUG [Thread-10] runnables.Runner - 15301035
2011-04-08 12:27:16,684 DEBUG [Thread-10] runnables.Runner - 7700929
2011-04-08 12:27:16,684 DEBUG [Thread-10] runnables.Runner - 17116545
2011-04-08 12:27:16,685 DEBUG [Thread-10] runnables.Runner - 4933581
2011-04-08 12:27:16,685 DEBUG [Thread-10] runnables.Runner - 2861116
Note: No GC happened at that time.
As written in a comment below: I am using a threadpool. My threads are fighting* over the same output file. They all write to a synchronized method.
Even if I reduce the size of my tread pool to one, the performance is still horrible. Nothing compared to the previous implementation. Wouldn’t that rule out things like IO dependency or thread switching?
I’ve modified my code now so that inside the Runnable nearly nothing is done. No Set, no writing. Just one log statement. And still I get those jumps.
So I rule out the writing or Set problem proposed by some. And when running only one thread, I also got these idle times. So thread switiching also doesn’t seem to be the problem.
I don’t know exactly what the problem was, but it seems that it was caused by a bad implementation of the
Executorinterface.I’m now using
and everything is working fine.
17.12min10threaded routine:13.45minI found the bad piece of code:
was invoked when the thread queue was full.