I’ve reduced my code down to the following, which is as simple as I could make it whilst retaining the compiler output that interests me.
void foo(const uint64_t used)
{
uint64_t ar[100];
for(int i = 0; i < 100; ++i)
{
ar[i] = some_global_array[i];
}
const uint64_t mask = ar[0];
if((used & mask) != 0)
{
return;
}
bar(ar); // Not inlined
}
Using VC10 with /O2 and /Ob1, the generated assembly pretty much reflects the order of instructions in the above C++ code. Since the local array ar is only passed to bar() when the condition fails, and is otherwise unused, I would have expected the compiler to optimize to something like the following.
if((used & some_global_array[0]) != 0)
{
return;
}
// Now do the copying to ar and call bar(ar)...
Is the compiler not doing this because it’s simply too hard for it to identify such optimizations in the general case? Or is it following some strict rule that forbids it from doing so? If so, why, and is there some way I can give it a hint that doing so wouldn’t change the semantics of my program?
Note: obviously it would be trivial to obtain the optimized output by just rearranging the code, but I’m interested in why the compiler won’t optimize in such cases, not how to do so in this (intentionally simplified) case.
There are no “strict rules” controlling what kind of assembly language the compiler is permitted to output. If the compiler can be certain that a block of code does not need to be executed (because it has no side effects) due to some precondition, then it is absolutely permitted to short-circuit the whole thing.
This sort of optimisation can be fairly complex in the general case, and your compiler might not go to all that effort. If this is performance critical code, then you can fine-tune your source code (as you suggest) to help the compiler generate the best assembly code. This is a trial-and-error process though, and you might have to do it again for the next version of the compiler.