Given the code :
for (int i = 0; i < n; ++i)
{
A(i) ;
B(i) ;
C(i) ;
}
And the optimization version :
for (int i = 0; i < (n - 2); i+=3)
{
A(i)
A(i+1)
A(i+2)
B(i)
B(i+1)
B(i+2)
C(i)
C(i+1)
C(i+2)
}
Something is not clear to me : which is better ? I can’t see anything that works any faster using the other version . Am I missing something here ?
All I see is that each instruction is depending on the previous instruction , meaning that
I need to wait that the previous instruction would finish in order to start the one after …
Thanks
In the high-level view of a language, you’re not going to see the optimization. The speed enhancement comes from what the compiler does with what you have.
In the first case, it’s something like:
In the second it’s something like:
You can see in the latter case, the overhead of testing and jumping is only 1 instruction per 3. In the first it’s 1 instruction per 1; so it happens a lot more often.
Therefore, if you have invariants you can rely on (an array of mod 3, to use your example) then it is more efficient to unwind loops because the underlying assembly is written more directly.