Overview

The Branch Processing Unit is responsible for changing the direction of program flow when branches are taken, predicting unresolved branches, and recovering from mispredicted branches by communicating with the integer, floating-point, and instruction units. The BPU also maintains special purpose registers that determine the branch target and whether conditional branches are taken.

Implementation Overview

The BPU's responsibilities are divided among five classes: BPU, BranchRegs, BranchExecute, BranchMispredict, and BranchWriteback. The BPU class acts as a shell-class, encapsulating and directing requests to the other BPU sub-classes. An instance of the BPU class contains a single instance of each of the BranchRegs, BranchExecute, BranchMispredict, and BranchWriteback sub-classes. The BranchRegs class is responsible for maintaining the contents of the BPU's special purpose registers. The BranchExecute class correlates to the PPC601's BE (Branch Execute) stage. The BranchMispredict class performs the duties of the PPC601's MR (Mispredict Recovery) stage. The BranchWriteback class handles the responsibilities of the PPC601's BW (Branch Writeback) stage.

BranchRegs class

The BranchRegs class maintains the contents of the BPU's three special purpose registers: the link register (LR), the count register (CTR), and the condition register (CR). The LR and the CTR can be modified by either the integer unit or from within the BPU. To avoid stalling or RAW data hazards, the BranchRegs class contains nine shadow registers for each of these registers, as does the PPC601. There are no shadow registers for the CR, because the CR can only be modified by integer unit instructions. The BranchRegs class provides three methods for reading and modifying the BPU registers from within the BPU.

  enum BPU::Reg {CR0, CR1, CR2, CR3, CR4, CR5, CR6, CR7, CR, LR, CTR}
  WORD BranchRegs::Read(BPU::Reg reg)
  void BranchRegs::Write(BPU::Reg reg, WORD value)
  void BranchRegs::Update(BPU::Reg reg)
Write allocates an available shadow register and writes the modified value into this register. Read returns the value of the most recent shadow register. Update writes the contents of the least recent shadow register into the actual register and frees the shadow register.

The integer unit cannot process instructions ahead of the BPU, so the integer unit reads and writes directly to BPU registers. Two methods are provided for integer unit access to the BPU registers.

  WORD BranchRegs::ReadIU(BPU::Reg reg)
  void BranchRegs::WriteIU(BPU::Reg reg, WORD value)

Common sub-class features

Each of the BranchExecute, BranchMispredict, and BranchWriteback classes has three member functions.
  bool Ready()
  void Load(Instruction* i)
  Instruction* CurInstruc()
Ready determines if the stage is ready for the next branch instruction to be loaded. Load actually loads the next instruction into the stage. In the BranchMispredict class, instead of a single parameter, Load has three parameters. The two additional parameters, predictedTaken and mispredictAddress, have previously been calculated in the BranchExecute stage.
  void BranchMispredict::Load(Instruction* i, bool predictedTaken,
                              WORD mispredictAddress)
CurInstruc returns the instruction that was in the stage during the cycle that has just completed.

Like all processing units and sub-units in the simulator, each of these classes has three member functions which, when called, perform a portion of the work for that cycle.

  void StartStage()
  void DoStage()
  void EndStage()
The intent of StartStage, DoStage, and EndStage is to support nondeterministic parallelism. These stages represent the start, do, and end phases of the clock cycle. Most BPU actions, however, are only important at either the extreme beginning or end of a cycle, so the do phase does nothing in any of the BPU's sub-classes (please see the note in the TODO section below).

Branch prediction

A significant amount of BPU code is dedicated to branch prediction and misprediction recovery. Some integer instructions, like cmp, modify fields of the CR which are read by later BPU instructions. Branch prediction is needed when dependent BPU instructions following integer instructions reach the BE stage before their integer counter-parts reach the IE stage in the integer pipeline. To avoid stalling or RAW data hazards when branches are not "resolved", the BPU makes an educated guess about which direction to branch and makes sure to cleanup afterwards when it guesses incorrectly. The prediction algorithm is to predict all branches as not taken, unless the branch is a "bc" instruction branching backwards. If the last bit of a branch instruction's BO field is asserted (1), the prediction is reversed. This branch prediction code is in BranchExecute::StartStage and BranchExecute::Conditional. Sometime after the branch is predicted and the processor begins executing the predicted path, the branch is resolved. If the branch was predicted incorrectly earlier, the instruction queue and integer unit are flushed of predicted instructions and the processor begins executing the untaken path. This misprediction recovery code is in BranchMispredict::EndStage.

BranchExecute class

During the start phase, the BranchExecute class calculates the target address for the current branch instruction. If the branch is conditional, the mispredict address is calculated and the BranchExecute class determines if the branch has been resolved yet (IU::CompletedIE). If the branch is unconditional or the branch has been resolved and should be taken, the dispatcher is instructed to fold the instruction queue after the current instruction's address (Dispatcher::Fold) and the fetcher is instructed to begin fetching at the target address (Fetcher::Fetch). Otherwise, the branch is predicted as described earlier and if the branch is predicted as taken, the instruction queue is folded and the fetcher begins fetching at the target address (Fetcher::Fetch). Unresolved branches are then passed on to the MR stage (BranchMispredict::Load). The BranchExecute class does nothing during the do phase. During the end phase, instructions that have not been passed to the MR stage and that update either the LR or CTR registers are passed on to the BW stage (BranchWriteback::Load).

BranchMispredict class

During the start phase, the BranchMispredict class instructs the integer unit not to let any instructions from the predicted path past the ID stage (IU::SetPredict). Nothing is done here either during the do phase. A majority of the BranchMispredict's work is performed during the end phase. The BranchMispredict class first determines whether the current branch instruction has been resolved yet (IU::CompletedIE) and if so, whether the branch was predicted correctly. If the branch prediction was correct, the integer unit is instructed to clear the block preventing predicted instructions from passing beyond the ID stage (IU::ClearPredict). If the prediction was incorrect, predicted instructions in the integer unit and instruction queue are flushed (Dispatcher::Flush and IU::FlushPredict) and the fetcher is instructed to begin fetching at the mispredict address (Fetcher::Fetch). If the branch instruction has modified the LR or CTR, the instruction is passed on to the BW stage (BranchWriteback::Load). If the branch has not been resolved yet, the current instruction remains in the MR stage until the next cycle.

BranchWriteback class

All of the work for the BranchWriteback class is performed in the start phase. Up to nine branch instructions may be waiting to writeback in the writeback stage. In the start phase, the BranchWriteback class determines if any of the waiting instructions can writeback by querying the integer unit (IU::CompletedIC). If so, the registers that have been modified by each instruction are updated in BranchRegs::Update and the instructions are retired. Nothing happens during either the do phase or the end phase.

BPU class

Each of the simulator's processing units has been designed to encapsulate it, as much as possible, from the other parts of the simulator. Thus, an outside user of the BPU (IU, Dispatcher, ...) should have no direct access to the BranchRegs, BranchExecute, BranchMispredict, or BranchWriteback sub-classes. Access to important features of the BPU is given indirectly through the BPU class itself.
  enum BPU::Reg {CR0, CR1, CR2, CR3, CR4, CR5, CR6, CR7, CR, LR, CTR}
  bool BPU::Ready()
  bool BPU::Load(Instruction* i)
  WORD BPU::ReadReg(Reg reg)
  void BPU::WriteReg(Reg reg, WORD value)
  void BPU::StartStage()
  void BPU::DoStage()
  void BPU::EndStage()
  void BPU::ReportStats()
These functions are basically wrappers around calls to other functions of the same (or similar) name within BPU sub-classes. Ready and Load call BranchExecute::Ready and BranchExecute::Load, respectively. ReadReg and WriteReg call BranchReg::ReadIU and BranchReg::WriteIU. StartStage, DoStage, and EndStage call each of the functions of the same name in the BranchWriteBack, BranchExecute, and BranchMispredict classes, in order. ReportStats calls CurInstruc in each of the BranchWriteBack, BranchExecute, and BranchMispredict classes and sends this information to the user interface.

TODO

The simulator's BPU is not yet a completely accurate simulation of the PPC601's BPU.

Primarily, delayed purge is not implemented. When delayed purge is working and a branch is predicted as taken, the instructions on the not-taken path are not immediately purged from the instruction queue and integer unit. Instead, these instructions are allowed to execute until the instructions from the taken path enter the instruction queue from the cache. If the branch is resolved as not taken before the instructions from the taken path return from the cache, the taken instructions are flushed, and the processor continues executing the not-taken path.

Also, communication between the BPU and FPU is lacking. Ideally, when predicting branches, the BPU should interact with the FPU much as it does with the integer unit. The FPU should have analogies to IU::SetPredict, IU::FlushPredict, and IU::ClearPredict.

The above changes will probably require improvement of the instruction tagging and register dependency mechanisms. Currently, tags are primarily used for synchronization of instructions in the BPU and FPU with instructions in the integer unit. This synchronization relationship is not mutual, thus there will be problems if, for instance, the integer unit is ahead of the FPU. A mechanism probably needs to be implemented to solve this problem and also provide RAW dependency checking within the pipelines themselves.

Zero cycle BPU dispatch and requests force a lot of the BPU's actions to occur in the start phase and end phase. However, some of the code that is currently in the start phase and end phase does not have to be there and can safely be moved into the do phase to more closely support the simulator's cycle design.


Floating Point Unit

Return to Design Outline