Let’s say I have pseudocode like this:
main() {
BOOL b = get_bool_from_environment(); //get it from a file, network, registry, whatever
while(true) {
do_stuff(b);
}
}
do_stuff(BOOL b) {
if(b)
path_a();
else
path_b();
}
Now, since we know that the external environment can influence get_bool_from_environment() to potentially produce either a true or false result, then we know that the code for both the true and false branches of if(b) must be included in the binary. We can’t simply omit path_a(); or path_b(); from the code.
BUT — we only set BOOL b the one time, and we always reuse the same value after program initialization.
If I were to make this valid C code and then compile it using gcc -O0, the if(b) would be repeatedly evaluated on the processor each time that do_stuff(b) is invoked, which inserts what are, in my opinion, needless instructions into the pipeline for a branch that is basically static after initialization.
If I were to assume that I actually had a compiler that was as stupid as gcc -O0, I would re-write this code to include a function pointer, and two separate functions, do_stuff_a() and do_stuff_b(), which don’t perform the if(b) test, but simply go ahead and perform one of the two paths. Then, in main(), I would assign the function pointer based on the value of b, and call that function in the loop. This eliminates the branch, though it admittedly adds a memory access for the function pointer dereference (due to architecture implementation I don’t think I really need to worry about that).
Is it possible, even in principle, for a compiler to take code of the same style as the original pseudocode sample, and to realize that the test is unnecessary once the value of b is assigned once in main()? If so, what is the theoretical name for this compiler optimization, and can you please give an example of an actual compiler implementation (open source or otherwise) which does this?
I realize that compilers can’t generate dynamic code at runtime, and the only types of systems that could do that in principle would be bytecode virtual machines or interpreters (e.g. Java, .NET, Ruby, etc.) — so the question remains whether or not it is possible to do this statically and generate code that contains both the path_a(); branch and the path_b() branch, but avoid evaluating the conditional test if(b) for every call of do_stuff(b);.
If you tell your compiler to optimise, you have a good chance that the
if(b)is evaluated only once.Slightly modifying the given example, using the standard
_Boolinstead ofBOOL, and adding the missing return types and declarations,the (relevant part of the) produced assembly by
clang -O3[clang-3.0] isbis tested only once, andmainjumps into an infinite loop of eitherpath_aorpath_bdepending on the value ofb. Ifpath_aandpath_bare small enough, they would be inlined (I strongly expect). With-Oand-O2, the code produced by clang would evaluatebin each iteration of the loop.gcc (4.6.2) behaves similarly with
-O3:oddly, it unrolled the loop for
path_a, but not forpath_b. With-O2or-O, it would however calldo_stuffin the infinite loop.Hence to
the answer is a definitive Yes, it is possible for compilers to recognize this and take advantage of that fact. Good compilers do when asked to optimise hard.
I don’t know the name of the optimisation, but two implementations doing that are gcc and clang (at least, recent enough releases).