I’ve just made a little test with loops in Java. I assumed that the speed of bitshifting in Java is usually faster than the default integer incrementation. So here is my sample code:
final int n = 16;
long n1 = System.nanoTime();
for (int i = 1; i < 1 << n; i <<= 1) {
// nothing
}
long n2 = System.nanoTime();
for (int i = 0; i < n; i++) {
// nothing
}
long n3 = System.nanoTime();
System.out.println("with shift = " + (n2 - n1) + " ns");
System.out.println("without shift = " + (n3 - n2) + " ns");
So my thought was, that the time between n1 and n2 would be smaller than between n2 and n3.
But the integer incrementation seems to be faster every time I run this snippet.
Here is the output of the code above:
with shift = 2445 ns
without shift = 1885 ns
with shift = 2374 ns
without shift = 1886 ns
with shift = 2374 ns
without shift = 1607 ns
Can someone please explain this beahviour? Is the answer in the way how the JVM compiles this code or is it based on the underlying architecture?
Ubuntu Linux 3.5.0-17-generic i686 GNU/Linux
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Pentium(R) Dual-Core CPU T4300 @ 2.10GHz
stepping : 10
microcode : 0xa07
cpu MHz : 1200.000
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm xsave lahf_lm dtherm
bogomips : 4189.42
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Pentium(R) Dual-Core CPU T4300 @ 2.10GHz
stepping : 10
microcode : 0xa07
cpu MHz : 1200.000
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm xsave lahf_lm dtherm
bogomips : 4189.42
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
========== EDIT ===============
OK so I updated my code to get a better measurement.
My JVM:
java version "1.6.0_37"
Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
Java HotSpot(TM) Server VM (build 20.12-b01, mixed mode)
The new code:
// amount of shifts
final int n = 16;
// recorded times
long n1 = 0, n2 = 0, n3 = 0, n4 = 0, n5 = 0;
// measured times
long withShiftFor = Long.MAX_VALUE;
long withoutShiftFor = Long.MAX_VALUE;
long withShiftWhile = Long.MAX_VALUE;
long withoutShiftWhile = Long.MAX_VALUE;
// instance to operate with
boolean b = true;
// do some loops to measure a better result
for (int x = 0; x < 2000000; x++) {
// for loop with shift
n1 = System.nanoTime();
for (int i = 1; i < 1 << n; i <<= 1) {
b = !b;
}
// for loop wihtout shift
n2 = System.nanoTime();
for (int i = 0; i < n; i++) {
b = !b;
}
// while loop with shift
n3 = System.nanoTime();
int i = 1;
while (i < 1 << n) {
b = !b;
i <<= 1;
}
// while loop without shift
n4 = System.nanoTime();
int j = 0;
while (j < n) {
b = !b;
j++;
}
n5 = System.nanoTime();
// take minimal time to save best result
withShiftFor = Math.min(withShiftFor, n2 - n1);
withoutShiftFor = Math.min(withoutShiftFor, n3 - n2);
withShiftWhile = Math.min(withShiftWhile, n4 - n3);
withoutShiftWhile = Math.min(withoutShiftWhile, n5 - n4);
}
System.out.println("for with shift = " + withShiftFor + " ns");
System.out.println("for without shift = " + withoutShiftFor + " ns");
System.out.println("while with shift = " + withShiftWhile + " ns");
System.out.println("while without shift = " + withoutShiftWhile + " ns");
The new output after 3 runs ( each run took more than 5 seconds ):
for with shift = 907 ns
for without shift = 838 ns
while with shift = 907 ns
while without shift = 907 ns
for with shift = 907 ns
for without shift = 907 ns
while with shift = 907 ns
while without shift = 907 ns
for with shift = 907 ns
for without shift = 838 ns
while with shift = 907 ns
while without shift = 907 ns
So you were right, there is nearly the same result after some seconds and a lot of iterations. But why is the for loop without shifting faster than each other solution? Is there any optimization by the jvm dispite of the one line for incrementation versus 4 lines by shifting mentioned by you? Why is the while with incrementation as fast as the other loops?
When you run short loops, the code is interpreted. So if you are not going to run the code often or you cannot warmup the code, then you should benchmark this and expect strange results like the ones you have.
If you want to compare compiled/optimised code you should ignore the first 10K to 20K loops, as a loop need to iterate 10K time to rigger it to compile by default (and then its compiled in the background which takes a little time)
In any case I would also suggest running the test for at least 2 seconds to reduce variation.
Your loops don’t do anything and I would expect the JIT to eliminate them and you end up just timing how long it takes to do System.nanoTime() which can add 40 – 1000 ns depending on the system.