The integer unit(fixed point unit) consists of a pipeline of 4 stages and the general purpose registers. The Pipeline stages consist of the decode, execute, complete stage and cache access (CACC), and write back. These are the same stages that exist on the chip, with some portions abstracted for programming reasons.
A new instruction enters the decode stage if the IU pipeline is not stalled with an instruction already occupying it, where it is broken down, and the opcode, registers used, and any immediate values are decoded. The decode stage passes these instruction parts to the execute stage.
The execution stage serves both as integer cache arbitration and ALU. First the execution stage must check if it is a memory access (lw/sw) instruction. If it is, then the address of the memory access is dispatched to the memory unit for load words, and the address and data to store for store words. The instruction is then passed to the cache access (CACC) stage to complete the memory access. Otherwise, if the instruction is an ALU instruction, the results are computed, the CR is updated and the instruction is passed to the write back stage. The execution stage keeps track of when it stalled.
When an instruction enters the cache access (CACC) stage, it is determined whether it is a load or a store instruction. A load instruction causes the CACC to just wait until it receives data from the cache/memory unit. A store instruction will send to the cache the data to store. This will be repeated until a positive response is received from the cache, which could take several clock cycles. The CACC stage will stall when memory takes more than one cycle to grant access or return a value.
An instruction enters the write back stage either from the execution or the CACC stage or both. The writeback stage tells the GPR manager the new values for its registers. Up to two registers could be updated in one clock cycle, one from the CACC and one from the execution stage ALU.
The general purpose register manager (GPR) is simplified. There are 2 sets of 32 registers. One is the actual register set, and the other is the "rename" set. As an instruction flows through the integer unit pipeline the temporary values are stored in the "rename" registers. This gives all the other stages in the pipeline forwarded access to the information. Using the two sets of registers also allows the GPR manager to detect hazards easily by marking certain registers invalid when a later stage is waiting for a new value and a previous needs the value. An example of this would be a load instruction followed by an ALU instruction. The load instruction may stall if the cache does not have it currently, so the ALU instruction using the same register we are loading must not execute until the new value is obtained. The execute stage will try to access the registers and not be granted access, causing it to stall, thus protecting the instruction flow.
Even though the general units will execute in a non-deterministic way, the same does not necessarily hold within a unit. This is true for the integer unit's pipeline, specifically in the "do_phase" of the clock cycles, but not necessarily within the start, end or report phases.
During the start phase of the clock cycle three of the four stages of the integer unit pipeline are reading their current state and passing messages to other units. The exceptions to this are either when the unit stalled the previous cycle and already knows their state or the integer decode stage. The decode stage is idle in the start phase of the clock cycle, because reading its data causes it to be destroyed by the instruction queue manager, and it needs to be determined if the pipeline is going to stall.
In the start phase the execution stage first checks if it stalled in the previous clock cycle. If it did not it reads its latch and checks if the instruction is ALU or a memory access instruction. If the instruction is a memory access the execution stage requests cache arbitration, and sends the specified address to the cache. If the instruction is an ALU instruction nothing else can be done in the start phase of the clock cycle. The cache access stage(CACC) acts similarly to the execution stage. It first checks if it stalled in the previous clock cycle. If it did not it checks to see if its instruction is a store instruction, if it is then it requests to write it to the cache. The cache may or may not grant access, if it doesn't then the next clock cycle we will try this again, and cause a stall, otherwise the start phase is over the CACC. The write back stage will read its latch unconditionally in the start phase.
The "do phase" must be ordered to properly protect data within the integer pipeline. The action takes starting from the write back stage and continues backward to the decode stage. The writeback stage instructs the GPR manager to update the register values. Next, the CACC stage determines if its instruction is a load instruction, if it is check if the memory unit has returned a value to us in the start phase. If we did get the value we instruct the GPR manager to update the "rename" register we are loading., otherwise we stall and check again in the next clock cycle. The execution stage of the pipeline. If the execution stage has an ALU instruction it computes the values necessary updates the CR and then marks itself to stall if the CACC stalled. The decode stage then executes and peeks into IQ0 only if there wasn't another stall in the pipeline, and breaks down the instruction for the next clock cycle.
The end phase integer pipeline can also be executed in any order. If the decode stage did not have to stall, it writes its latch. The Execution stage checks to see if it stalled, if it did not it writes to the write back latch if it had an ALU instruction and the CACC latch for a memory instruction and updates its stalled_last variable. If the CACC stage didn't have to stall, and had a load instruction it writes to the writeback latch. The writeback stage does nothing in the end phase timing.