I made a toy program to test Java’s concurrency performance. I put it here:
https://docs.google.com/open?id=0B4e6u_s5iHT6MTNkZGM5ODQtNjZmYi00NTMwLWJlMjUtYzViOWZlMDM5NGVi
It accepts an integer number as the argument that indicates how many threads to use. The program just figures out prime numbers from a range. A generic version is obtained by commenting line 44~53, and it generates nearly perfect scalability.
However, when I uncommenting line 44~53, which does simple computation locally, and adjust the variable s to a value big enough, scalability may disappear.
My question is whether my toy program uses shared data which may result in degraded concurrency performance. And how to explain the disappeared scalability (I think low level overhead, like garbage collection, causes that)? Any solution can solve problems like this case?
The code in question is:
Of course this will degrade performance if you increase the value of
s, sincescontrols how many things you put into the list. But that has very little to do with concurrency or scalability. If you write code telling the computer to waste time doing thousands or millions of throw-away computations, then of course your performance will degrade.In more technical terms, the time-complexity of this section of code is
O(2n)(it takesnoperations to build the list, and thennoperations to iterate it and increment each value), wherenis equal tos. So the bigger you makes, the longer it will take to execute this code.In terms of why this would seem to make the benefits of concurrency smaller, have you considered the memory implications as
sbecomes larger? For instance, are you sure the Java heap is large enough to hold everything in memory without anything getting swapped out to disk? And even if nothing is getting swapped out, by making the length of theArrayListlarger you are giving the garbage collector more work to do when it runs (and possibly increasing the frequency at which it runs). Note that depending upon the implementation, the garbage collector may be pausing all of your threads each time it runs.I wonder, if you allocate a single
ArrayListinstance per thread, at the time the thread is created, and then reuse that in the call toisPrime()instead of creating a new list each time, does that improve things?Edit: Here’s a fixed up version: http://pastebin.com/6vR7Uhez
It gives the following output on my machine:
…which shows nearly linear scaling as the number of threads is ramped up. The problems that I fixed were a combination of points raised above and in John Vint’s (now deleted) answer, as well as incorrect/unnecessary use of
ConcurrentLinkedQueuestructures and some questionable timing logic.If we enable GC logging and profile both versions, we can see that the original version spends about 10x as much time running garbage-collection than the modified version:
Which implies to me that between the constant list allocations and
Integerautoboxing, the original implementation was simply churning through too many objects, which places too much load on the GC, which degraded the performance of your threads to the point where there was no benefit (or even a negative benefit) to creating more threads.So all this says to me is that if you want to get good scaling out of concurrency in Java, whether your task is large or small, you have to pay attention to how you are using memory, be aware of potentially hidden pitfalls and inefficiencies, and optimize away the inefficient bits.