The GCC manual only shows examples where __builtin_expect() is placed around the entire condition of an ‘if’ statement.
I also noticed that GCC does not complain if I use it, for example, with a ternary operator, or in any arbitrary integral expression for that matter, even one that is not used in a branching context.
So, I wonder what the underlying constraints of its usage actually are.
Will it retain its effect when used in a ternary operation like this:
int foo(int i)
{
return __builtin_expect(i == 7, 1) ? 100 : 200;
}
And what about this case:
int foo(int i)
{
return __builtin_expect(i, 7) == 7 ? 100 : 200;
}
And this one:
int foo(int i)
{
int j = __builtin_expect(i, 7);
return j == 7 ? 100 : 200;
}
It apparently works for both ternary and regular if statements.
First, let’s take a look at the following three code samples, two of which use
__builtin_expectin both regular-if and ternary-if styles, and a third which does not use it at all.builtin.c:
ternary.c:
nobuiltin.c:
When compiled with
-O3, all three result in the same assembly. However, when the-Ois left out (on GCC 4.7.2), both ternary.c and builtin.c have the same assembly listing (where it matters):builtin.s:
ternary.s:
Whereas nobuiltin.c does not:
The relevant part:
Basically,
__builtin_expectcauses extra code (sete %al…) to be executed before theje .L2based on the outcome oftestl %eax, %eaxwhich the CPU is more likely to predict as being 1 (naive assumption, here) instead of based on the direct comparison of the input char with'c'. Whereas in the nobuiltin.c case, no such code exists and theje/jnedirectly follows the comparison with ‘c’ (cmp $99). Remember, branch prediction is mainly done in the CPU, and here GCC is simply “laying a trap” for the CPU branch predictor to assume which path will be taken (via the extra code and the switching ofjeandjne, though I do not have a source for this, as Intel’s official optimization manual does not mention treating first-encounters withjevsjnedifferently for branch prediction! I can only assume the GCC team arrived at this via trial and error).I am sure there are better test cases where GCC’s branch prediction can be seen more directly (instead of observing hints to the CPU), though I do not know how to emulate such a case succinctly/concisely. (Guess: it would likely involve loop unrolling during compilation.)