The attached program (see at the end), when executed, yields the following output:
..........
with sleep time of 0ms
times= [1, 1, 1, 0, 1, 1, 0, 1, 1, 0]
average= 0.7
..........
with sleep time of 2000ms
times= [2, 2, 2, 2, 2, 1, 2, 2, 2, 2]
average= 1.9
In both cases the exact same code is executed which is to repeatedly get the next value from a Random object instantiated which at the start of the program. The warm up method executed first is supposed to trigger any sort of JIT otimizations before the actual testing begins.
Can anyone explain the reason for this difference? I have been able to repeat this result in my machine every time so far, and this was executed on a multi-core Windows system with java 7.
One interesting thing is that if the order in which the tests are executed is reversed, that is, if we run the loop with the delay before the loop without the delay, then the timings are more similar (with the no delay loop actually taking longer):
..........
with sleep time of 2000ms
times= [2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
average= 2.0
..........
with sleep time of 0ms
times= [2, 3, 3, 2, 3, 3, 2, 3, 2, 3]
average= 2.6
As much as I could tell, no object is being created inside the operation method, and when running this through a profiler it does not seem that garbage collection is ever triggered. A wild guess is that some value gets cached in a processor-local cache which gets flushed out when the thread is put to sleep and then when the thread wakes up it needs to retrieve the value from main memory, but that is not so fast. That however does not explain why inverting the order makes a difference…
The real-life situation where I initially observed this behavior (which prompted me to write this sample test class) was XML unmarshalling, where I noticed that unmarshalling the same document repeated times one after the other in quick succession yielded better times than performing the same thing but with a delay between calls to unmarshal (delay generated through sleep or manually).
Here is the code:
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
public class Tester
{
public static void main(String[] args) throws InterruptedException
{
warmUp(10000);
int numRepetitions = 10;
runOperationInALoop(numRepetitions, 0);
runOperationInALoop(numRepetitions, 2000);
}
private static void runOperationInALoop(int numRepetitions, int sleepTime) throws InterruptedException
{
List<Long> times = new ArrayList<Long>(numRepetitions);
long totalDuration = 0;
for(int i=0; i<numRepetitions; i++)
{
Thread.sleep(sleepTime);
long before = System.currentTimeMillis();
someOperation();
long duration = System.currentTimeMillis() - before;
times.add(duration);
totalDuration = totalDuration + duration;
System.out.print(".");
}
System.out.println();
double averageTimePerOperation = totalDuration/(double)numRepetitions;
System.out.println("with sleep time of " + sleepTime + "ms");
System.out.println(" times= " + times);
System.out.println(" average= " + averageTimePerOperation);
}
private static void warmUp(int warmUpRepetitions)
{
for(int i=0; i<warmUpRepetitions; i++)
{
someOperation();
}
}
public static int someInt;
public static Random random = new Random(123456789L);
private static void someOperation()
{
for(int j=0; j<50000; j++)
{
someInt = ((int)random.nextInt()*10) + 1;
}
}
}
When you sleep for even a short period of time (you may find that 10 ms is long enough) you give up the CPU and the data, instruction and branch prediction caches are disturbed or even cleared. Even making a system call like System.currentTimeMillis() or the much more accurate System.nanoTime() can do this to a small degree.
AFAIK, The only way to avoid giving up the core is to busy wait and using thread affinity to lock your thread to a core. This prevent minimises such a disturbance and means your program can runs 2-5x faster in low latency situations i.e. when sub-millisecond tasks matter.
For your interest
http://vanillajava.blogspot.co.uk/2012/01/java-thread-affinity-support-for-hyper.html
http://vanillajava.blogspot.co.uk/2012/02/how-much-difference-can-thread-affinity.html