Since the ret instruction is an indirect call, does the ret instruction on x86 stall the pipeline, or is it somehow optimized to behave like a more direct call?
Since the ret instruction is an indirect call, does the ret instruction on x86
Share
From the Intel Optimization Reference Manual, the branch prediction unit contains a Return Stack Buffer precisely to predict
retinstructions (section 2.2.2.1) more accurately. The instruction queueing and decode unit also tracks changes in the stack pointer to improve decoding bandwidth (section 2.2.2.5).In more detail, section 3.4.1.4 describes some “rules”, mostly directed to compiler writers, to benefit from inlining, calls & returns – the most relevant is probably that a near/far call must be paired with a near/far return, which means pushing the return address on the stack and jumping to the callee is not recommended. Also, the call depth is recommended to not exceed 16 nested calls (the size of the RSB).
If those rules are followed, you can effectively treat them like indirect branches during branch selection (section 3.4.1.6), with everything that implies. You will most likely never encounter a stall on a
ret, except on pathological cases or self-modifying code.