I’m trying to minimize the number of branch instructions in my compiled assembly code for a particular architecture where branch instructions are very costly due to the way the processor pipelining is implemented.
I could try to implement self-modifying code to reduce the number of times a condition has to be tested in conditional branching, but are there any other things I can look at doing?
You shouldn’t care too much about the number of branch instructions visible in compiled code. You should care about the number of times a branch instruction is executed on the CPU when you run the program.
Two easy ways to reduce the number of branches executed:
If your architecture supports predicated instructions, then small
ifblocks can be generated with predicated instructions instead of branches. You can possibly ask your compiler to do this for you. e.g. If your compiler is GCC, then compiling with-O1, -O2, -O3 or -Os, or using the-fif-conversion2flag should do this.Remember that: Big
ifblocks aren’t if-converted because predicated instructions pass through the CPU pipeline irrespective of whether or not the condition is true. And this wastes cycles.Unroll loops. A loop means a branch. If you unroll it, you can get away with executing fewer branches (although in compiled code, you still ‘see’ the same number of branch instructions, right?).
Remember though: This increases code size. Which can mean increased miss rate on the instruction cache.
For example:
If N is known to be even, then unrolling twice manually is as easy as:
When this executes, the number of branches executed essentially halves.
Again, your compiler can probably automatically do this too. e.g. GCC unrolls some loops with
-funroll-loops.There are a few other tricks the compiler can do for you. e.g. If it’s GCC, then you should probably search this page for ‘branch’.