Consider the example with for loop:
for(int i = 0; i <= NUM; i++); // forward
for(int i = NUM; i >= 0; i--); // reverse
I tested this loops with gcc (linux-64). Without any optimization flag, forward loop was faster and with optimization to O3/O4, reverse loop was faster.
Somewhere I heard that due to better cache replacement techniques, forward loop is faster.
Personally I think, reverse loop should be faster (whether NUM is a constant or variable). Because any microprocessor will have single instruction for comparison with 0, i >= 0 (i.e. JLZ (jump if less than zero) and equivalent).
Is there any deterministic answer to this ?
No, there is absolutely no deterministic answer for this. You’re looking at two different levels of abstraction.
C++ has absolutely nothing to say about what happens under the covers, performance-wise. It specifies a virtual machine which executes C++ code and, while it covers functionality, it does not cover performance of the underlying environment (a).
Which of those is faster will depend on a variety of factors. You may find yourself running on a CPU which makes no distinction between comparing with an arbitrary value and comparing with zero.
You may find an architecture where incrementing a register is ten times faster than decrementing one, bizarre though that may seem.
You may even find a brain-dead architecture that has no decrement, add or subtract instructions at all, and you have to emulate decrement by calling increment 2n-1 times (where
nis the word size).Bottom line: you can’t presume to know what’s going on under the hood unless you want to look at a very specific CPU, compiler, etc.
You should optimise your code for readability first. If you need to process things in an increasing manner, use the first option. If a decreasing manner, use the latter. If either way seems equally natural, then choose the fastest one, discovered by benchmarking or analysis of the underlying architecture and assembler code. But only do this if you have a specific performance problem, otherwise you’re wasting effort.
In any case, since you’re almost certainly going to be using
ifor something, it’s likely that whatever tiny increase in performance you get by going the fastest way will be more than swamped by the fact that you now have to calculateNUM-iinside the loop (unless, of course, the compiler is smarter than the developer which, based on what I’ve seen fromgcc, is quite possible).(a) It does specify certain performance-related things such as the time complexity of some things in the containers library, but not specifically the thing you’re asking about, whether forward loops or reverse ones are faster.