So I understand the basic techniques that are used in branch prediction for pipelined processors – stuff like 2-bit saturated counters, two level adaptive predictors, etc.
Here are my questions:
1) Branch target prediction: why is this important and what are some of the mechanisms used here? When I think of a branch I think “bne r2, r3, LABEL” which says that if r2 != r3 then branch to LABEL which means do PC (program counter) = PC + LABEL. What’s so mysterious about predicting the target here? You know what it’s going to be based on the compiled value of LABEL.
I’m probably missing the point here somehow.
2) Why is the program counter value itself (e.g. 0x4001000C), or at least its last few bits, used as part of the branch prediction scheme? I saw a scheme where the last 4 bits of the PC were concatenated to the (4-bit) branch history register and that 8-bit value was used to access the pattern history table.
I would think the PC is pretty arbitrary!
Thank you for any help understanding these issues
Because of the CPU pipeline depth and the cache latency, it will take many cycles between fetching an instruction, fully decoding it to identify the branch target, and being able to fetch that instruction. So you predict the target in order to pre-emptively fetch the next instruction.
Becasue the PC uniquely identifies a particular branch instruction! How else are you going to index the branch-prediction table?