So I’m trying to understand the Tomasulo algorithm for out-of-order instruction execution. Here’s what I’m getting so far:
-
Instructions are fetched in-order and stored in an instruction queue.
-
Register renaming happens somewhere next…? From what I understand this is to avoid WAR/WAW hazards by giving labels to the registers. Say you have
add r1,r2,r3 (1)
add r3,r5,r6 (2)
You have a WAR hazard and need to ensure that instruction (1) reads the old value of r3 before adding it to r1. So I guess within the instruction queue (?) the hardware renames the registers, i.e.
add r1,r2,r3#1
add r3#2,r5,r6
Or something like that. -
Instructions are issued to reservation stations. From what I understand, each functional unit has its own set of reservation stations. But is it like a queue (FIFO) of instructions for that functional unit to execute when the appropriately tagged operands are available on the common data bus?
-
Since instructions can finish in arbitrary order (out of order), and more instructions can keep coming… Is there like a stage where the common data bus updates the register file before more instructions come in? I’ve heard a re-order buffer is used, which basically sorts the instructions back in order (this must imply instructions have a tag of some sort) and then the register results are committed back into the register file.
What I’m confused about is the implementation of register renaming, and the structure of the reservation stations.
Thank you for any help.
Tomasulo’s algorithm is really not tied to any specific hardware and in fact, in real machines, register renaming generally happens before instructions are inserted into the instruction queue. Let’s get a couple of fundamental points across. First, the registers as expressed in your program are logical registers. Second, the actual hardware registers are called physical registers. Tomasulo’s algorithm is simply a mechanism for mapping logical registers to physical registers. In real machines, there are generally two tables mapping logical registers to physical registers. One in the rename stage and one in the commit stage. There also must be more physical registers than logical ones. It works essentially this way:
The part about reservation stations is really a specific implementation of an out-of-order pipeline and essentially comes along with a physical register. There are plenty of machines that don’t really have a notion of a reservation station.
Hennessy and Patterson’s books on Computer Architecture are the standard textbook on this sort of stuff. I tried to explain this as simply as possible, but there are literally thousands of proposals for optimizing this stuff.