I have some critical branching code inside a loop that’s run about 2^26 times. Branch prediction is not optimal because m is random. How would I remove the branching, possibly using bitwise operators?
bool m;
unsigned int a;
const unsigned int k = ...; // k >= 7
if(a == 0)
a = (m ? (a+1) : (k));
else if(a == k)
a = (m ? 0 : (a-1));
else
a = (m ? (a+1) : (a-1));
And here is the relevant assembly generated by gcc -O3:
.cfi_startproc
movl 4(%esp), %edx
movb 8(%esp), %cl
movl (%edx), %eax
testl %eax, %eax
jne L15
cmpb $1, %cl
sbbl %eax, %eax
andl $638, %eax
incl %eax
movl %eax, (%edx)
ret
L15:
cmpl $639, %eax
je L23
testb %cl, %cl
jne L24
decl %eax
movl %eax, (%edx)
ret
L23:
cmpb $1, %cl
sbbl %eax, %eax
andl $638, %eax
movl %eax, (%edx)
ret
L24:
incl %eax
movl %eax, (%edx)
ret
.cfi_endproc
The branch-free division-free modulo could have been useful, but testing shows that in practice, it isn’t.
Testcase:
Actually timing this, using a test program:
(Note: there’s no
srand, so results are deterministic.)My original answer: 5.3s
The code in the question: 4.8s
Lookup table: 4.5s (
static unsigned lookup[2][k+1];)Lookup table: 4.3s (
static unsigned lookup[k+1][2];)Eric’s answer: 4.2s
This version: 4.0s