Given the following code : for (int i=0; i<n; i++) { counter += myArray[i];

Question

0

Asked: June 1, 20262026-06-01T11:40:40+00:00 2026-06-01T11:40:40+00:00

Given the following code : for (int i=0; i<n; i++) { counter += myArray[i];

0

Given the following code :

for (int i=0; i<n; i++)
{
  counter += myArray[i];
}

And the Loop unrolling version :

for (int i=0; i<n; i+=4)
{
  counter1 += myArray[i+0];
  counter2 += myArray[i+1];
  counter3 += myArray[i+2];
  counter4 += myArray[i+3];
}

total = counter1+ counter2 + counter3+ counter4;

Why do we have a cache miss in the first version ?
Is the second version has indeed a better performance than the 1st ? why ?

Regards

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T11:40:41+00:00

Why do we have a cache miss in the first version ?

As Oli points out in the comments. This question is unfounded. If the data is already in the cache, then there will be no cache misses.

That aside, there is no difference in memory access between your two examples. So that will not likely be a factor in any performance difference between them.

Is the second version has indeed a better performance than the 1st ? why ?

Usually, the thing to do is to actually measure. But in this particular example, I’d say that it will likely be faster. Not because of better cache access, but because of the loop-unrolling.

The optimization that you are doing is called “Node-Splitting”, where you separate the counter variable for the purpose of breaking the dependency chain.

However, in this case, you are doing a trivial reduction operation. Many modern compilers are able to recognize this pattern and do this node-splitting for you.

So is it faster? Most likely. But you should check to see if the compiler does it for you.

For the record: I just tested this on Visual Studio 2010.
And I am quite surprised that it is not able to do this optimization.

; 129  : 
; 130  :     int counter = 0;
; 131  : 
; 132  :     for (int i=0; i<n; i++)
    mov ecx, DWORD PTR n$[rsp]
    xor edx, edx
    test    ecx, ecx
    jle SHORT $LN1@main
$LL3@main:

; 133  :     {
; 134  :         counter += myArray[i];

    add edx, DWORD PTR [rax]
    add rax, 4
    dec rcx
    jne SHORT $LL3@main
$LN1@main:

; 135  :     }

Visual Studio 2010 does not seem to be capable of performing “Node Splitting” for this (trivial) example…

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Given the following code : for (int i=0; i<n; i++) { counter += myArray[i];

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply