look at these 2 loops
const int arrayLength = ...
Version 0
public void RunTestFrom0()
{
int sum = 0;
for (int i = 0; i < arrayLength; i++)
for (int j = 0; j < arrayLength; j++)
for (int k = 0; k < arrayLength; k++)
for (int l = 0; l < arrayLength; l++)
for (int m = 0; m < arrayLength; m++)
{
sum += myArray[i][j][k][l][m];
}
}
Version 1
public void RunTestFrom1()
{
int sum = 0;
for (int i = 1; i < arrayLength; i++)
for (int j = 1; j < arrayLength; j++)
for (int k = 1; k < arrayLength; k++)
for (int l = 1; l < arrayLength; l++)
for (int m = 1; m < arrayLength; m++)
{
sum += myArray[i][j][k][l][m];
}
}
Version 2
public void RunTestFrom2()
{
int sum = 0;
for (int i = 2; i < arrayLength; i++)
for (int j = 2; j < arrayLength; j++)
for (int k = 2; k < arrayLength; k++)
for (int l = 2; l < arrayLength; l++)
for (int m = 2; m < arrayLength; m++)
{
sum += myArray[i][j][k][l][m];
}
}
Results for arrayLength=50 are (average from multiple sampling compiled X64):
- Version 0: 0.998s (Standard error of the mean 0.001s) total loops: 312500000
- Version 1: 1.449s (Standard error of the mean 0.000s) total loops: 282475249
- Version 2: 0.774s (Standard error of the mean 0.006s) total loops: 254803968
- Version 3: 1.183s (Standard error of the mean 0.001s) total loops: 229345007
if we make arrayLength=45 then
- Version 0: 0.495s (Standard error of the mean 0.003s) total loops: 184528125
- Version 1: 0.527s (Standard error of the mean 0.001s) total loops: 164916224
- Version 2: 0.752s (Standard error of the mean 0.001s) total loops: 147008443
- Version 3: 0.356s (Standard error of the mean 0.000s) total loops: 130691232
why:
- loop start from 0 is faster than loop start from 1 though more loops
- why loop start from 2 behaves weird?
Update:
- I did each run 10 times, (that’s where standard error of the mean comes from)
- I also switched the order of version tests a couple of time. No big difference.
- The length of
myArrayof each dimension =arrayLength, I initialized it in the beginning and the time taken is excluded. The value is 1. Sosumgives the total loops. - The complied version is Released mode, and I run it from Outside VS. (Closed VS)
Update2:
Now I discard myArray completely, sum++ instead, and added GC.Collect()

public void RunTestConstStartConstEnd()
{
int sum = 0;
for (int i = constStart; i < constEnd; i++)
for (int j = constStart; j < constEnd; j++)
for (int k = constStart; k < constEnd; k++)
for (int l = constStart; l < constEnd; l++)
for (int m = constStart; m < constEnd; m++)
{
sum++;
}
}
Update
This appears to me to be a result of an unsuccessful attempt at optimization by the jitter, not the compiler. In short, if the jitter can determine the lower bound is a constant it will do something different which turns out to actually be slower. The basis for my conclusions takes some proving, so bear with me. Or go read something else if you’re not interested!
I concluded this after testing four different ways to set the lower bound of the loop:
The compiled intermediate language for all four versions of the looping section is almost identical. The only difference is that in version 1 the lower bound is loaded with the command
ldc.i4.#, where#is 0, 1, 2, or 3. That stands for load constant. (See ldc.i4 opcode). In all other versions, the lower bound is loaded withldloc. This is true even in case 3, where the compiler could infer thatlowerBoundis really a constant.The resulting performance is not constant. Version 1 (explicit constant) is slower than version 2 (run-time argument) along similar lines as found by the OP. What is very interesting is that version 3 is also slower, with comparable times to version 1. So even though the IL treats the lower bound as a variable, the jitter appears to have figured out that the value never changes, and substitutes a constant as in version 1, with the corresponding performance reduction. In version 4 the jitter can’t infer what I know — that
Confuseris actually an identity function — and so it leaves the variable as a variable. The resulting performance is the same as the command line argument version (2).My theory on the cause of the performance difference: The jitter is aware and makes use of the fine details of actual processor architecture. When it decides to use a constant other than
0, it has to actually go fetch that literal value from some storage which is not in the L2 cache. When it is fetching a frequently used local variable it instead reads its value from the L2 cache, which is insanely fast. Normally it doesn’t make sense to be taking up room in the precious cache with something as dumb as a known literal integer value. In this case we care more about read time than storage, though, so it has an undesired impact on performance.Here is the full code for the version 2 (command line arg):
For version 1: same as above except remove
lowerBounddeclaration and replace alllowerBoundinstances with literal0,1,2, or3(compiled and executed separately).For version 3: same as above except replace lowerBound declaration with
For version 4: same as above except replace lowerBound declaration with
Where
Confuseris:Results (50 iterations of each test, in 5 batches of 10):
That is an enourmous array. For all practical purposes you are testing how long it takes your operating system to fetch the values of each element from memory, not to compare whether
j,k, etc are less thanarrayLength, to increment the counters, and increment your sum. The latency to fetch those values has little to do with the runtime or jitter per se and a lot to do with whatever else happens to be running on your system as a whole and the current compression and organization of the heap.In addition, because your array is taking up so much room and being accessed frequently it’s quite possible that garbage collection is running during some of your test iterations, which would completely inflate the apparent CPU time.
Try doing your test without the array lookup — just add 1 (
sum++) and then see what happens. To be even more thorough, callGC.Collect()just before each test to avoid a collection during the loop.