I just stumbled upon a change that seems to have counterintuitive performance ramifications. Can anyone provide a possible explanation for this behavior?
Original code:
for (int i = 0; i < ct; ++i) {
// do some stuff...
int iFreq = getFreq(i);
double dFreq = iFreq;
if (iFreq != 0) {
// do some stuff with iFreq...
// do some calculations with dFreq...
}
}
While cleaning up this code during a “performance pass,” I decided to move the definition of dFreq inside the if block, as it was only used inside the if. There are several calculations involving dFreq so I didn’t eliminate it entirely as it does save the cost of multiple run-time conversions from int to double. I expected no performance difference, or if any at all, a negligible improvement. However, the perfomance decreased by nearly 10%. I have measured this many times, and this is indeed the only change I’ve made. The code snippet shown above executes inside a couple other loops. I get very consistent timings across runs and can definitely confirm that the change I’m describing decreases performance by ~10%. I would expect performance to increase because the int to double conversion would only occur when iFreq != 0.
Chnaged code:
for (int i = 0; i < ct; ++i) {
// do some stuff...
int iFreq = getFreq(i);
if (iFreq != 0) {
// do some stuff with iFreq...
double dFreq = iFreq;
// do some stuff with dFreq...
}
}
Can anyone explain this? I am using VC++ 9.0 with /O2. I just want to understand what I’m not accounting for here.
You should put the conversion to dFreq immediately inside the if() before doing the calculations with iFreq. The conversion may execute in parallel with the integer calculations if the instruction is farther up in the code. A good compiler might be able to push it farther up, and a not-so-good one may just leave it where it falls. Since you moved it to after the integer calculations it may not get to run in parallel with integer code, leading to a slowdown. If it does run parallel, then there may be little to no improvement at all depending on the CPU (issuing an FP instruction whose result is never used will have little effect in the original version).
If you really want to improve performance, a number of people have done benchmarks and rank the following compilers in this order:
1) ICC – Intel compiler
2) GCC – A good second place
3) MSVC – generated code can be quite poor compared to the others.
You may also want to try -O3 if they have it.