Overview
The Branch Processing Unit is responsible for changing the direction
of program flow when branches are taken, predicting unresolved
branches, and recovering from mispredicted branches by communicating
with the integer, floating-point, and instruction units. The BPU also
maintains special purpose registers that determine the branch target
and whether conditional branches are taken.
Implementation Overview
The BPU's responsibilities are divided among five classes: BPU,
BranchRegs, BranchExecute, BranchMispredict, and BranchWriteback. The
BPU class acts as a shell-class, encapsulating and directing requests
to the other BPU sub-classes. An instance of the BPU class contains a
single instance of each of the BranchRegs, BranchExecute,
BranchMispredict, and BranchWriteback sub-classes. The BranchRegs
class is responsible for maintaining the contents of the BPU's special
purpose registers. The BranchExecute class correlates to the PPC601's BE
(Branch Execute) stage. The BranchMispredict class performs the
duties of the PPC601's MR (Mispredict Recovery) stage. The
BranchWriteback class handles the responsibilities of the PPC601's BW
(Branch Writeback) stage.
BranchRegs class
The BranchRegs class maintains the contents of the BPU's three special
purpose registers: the link register (LR), the count register (CTR),
and the condition register (CR). The LR and the CTR can be modified
by either the integer unit or from within the BPU. To avoid stalling
or RAW data hazards, the BranchRegs class contains nine shadow
registers for each of these registers, as does the PPC601. There are no
shadow registers for the CR, because the CR can only be modified by
integer unit instructions. The BranchRegs class provides three
methods for reading and modifying the BPU registers from within the
BPU.
enum BPU::Reg {CR0, CR1, CR2, CR3, CR4, CR5, CR6, CR7, CR, LR, CTR}
WORD BranchRegs::Read(BPU::Reg reg)
void BranchRegs::Write(BPU::Reg reg, WORD value)
void BranchRegs::Update(BPU::Reg reg)
Write allocates an available shadow register and writes the
modified value into this register. Read returns the value of
the most recent shadow register. Update writes the contents of
the least recent shadow register into the actual register and frees
the shadow register.
The integer unit cannot process instructions ahead of the BPU, so the
integer unit reads and writes directly to BPU registers. Two methods
are provided for integer unit access to the BPU registers.
WORD BranchRegs::ReadIU(BPU::Reg reg)
void BranchRegs::WriteIU(BPU::Reg reg, WORD value)
Common sub-class features
Each of the BranchExecute, BranchMispredict, and BranchWriteback classes
has three member functions.
bool Ready()
void Load(Instruction* i)
Instruction* CurInstruc()
Ready determines if the stage is ready for the next branch
instruction to be loaded. Load actually loads the next
instruction into the stage. In the BranchMispredict class, instead of
a single parameter, Load has three parameters. The two
additional parameters, predictedTaken and mispredictAddress, have
previously been calculated in the BranchExecute stage.
void BranchMispredict::Load(Instruction* i, bool predictedTaken,
WORD mispredictAddress)
CurInstruc returns the instruction that was in the stage during
the cycle that has just completed.
Like all processing units and sub-units in the simulator, each of
these classes has three member functions which, when called,
perform a portion of the work for that cycle.
void StartStage()
void DoStage()
void EndStage()
The intent of StartStage, DoStage, and EndStage
is to support nondeterministic parallelism. These stages represent the
start, do, and end phases of the clock cycle. Most BPU actions,
however, are only important at either the extreme beginning or end of
a cycle, so the do phase does nothing in any of the BPU's
sub-classes (please see the note in the TODO section below).
Branch prediction
A significant amount of BPU code is dedicated to branch prediction and
misprediction recovery. Some integer instructions, like cmp, modify
fields of the CR which are read by later BPU instructions. Branch
prediction is needed when dependent BPU instructions following integer
instructions reach the BE stage before their integer counter-parts
reach the IE stage in the integer pipeline. To avoid stalling or RAW
data hazards when branches are not "resolved", the BPU makes an
educated guess about which direction to branch and makes sure to
cleanup afterwards when it guesses incorrectly. The prediction
algorithm is to predict all branches as not taken, unless the branch
is a "bc" instruction branching backwards. If the last bit of a
branch instruction's BO field is asserted (1), the prediction is
reversed. This branch prediction code is in
BranchExecute::StartStage and
BranchExecute::Conditional. Sometime after the branch is
predicted and the processor begins executing the predicted path, the
branch is resolved. If the branch was predicted incorrectly earlier,
the instruction queue and integer unit are flushed of predicted
instructions and the processor begins executing the untaken path.
This misprediction recovery code is in
BranchMispredict::EndStage.
BranchExecute class
During the start phase, the BranchExecute class calculates the
target address for the current branch instruction. If the branch is
conditional, the mispredict address is calculated and the
BranchExecute class determines if the branch has been resolved yet
(IU::CompletedIE). If the branch is unconditional or the
branch has been resolved and should be taken, the dispatcher is
instructed to fold the instruction queue after the current
instruction's address (Dispatcher::Fold) and the fetcher is
instructed to begin fetching at the target address
(Fetcher::Fetch). Otherwise, the branch is predicted as
described earlier and if the branch is predicted as taken, the
instruction queue is folded and the fetcher begins fetching at the
target address (Fetcher::Fetch). Unresolved branches are then
passed on to the MR stage (BranchMispredict::Load). The
BranchExecute class does nothing during the do phase. During
the end phase, instructions that have not been passed to the MR
stage and that update either the LR or CTR registers are passed on to
the BW stage (BranchWriteback::Load).
BranchMispredict class
During the start phase, the BranchMispredict class instructs the
integer unit not to let any instructions from the predicted path past
the ID stage (IU::SetPredict). Nothing is done here either
during the do phase. A majority of the BranchMispredict's work is
performed during the end phase. The BranchMispredict class first
determines whether the current branch instruction has been resolved
yet (IU::CompletedIE) and if so, whether the branch was
predicted correctly. If the branch prediction was correct, the
integer unit is instructed to clear the block preventing predicted
instructions from passing beyond the ID stage
(IU::ClearPredict). If the prediction was incorrect, predicted
instructions in the integer unit and instruction queue are flushed
(Dispatcher::Flush and IU::FlushPredict) and the fetcher
is instructed to begin fetching at the mispredict address
(Fetcher::Fetch). If the branch instruction has modified the
LR or CTR, the instruction is passed on to the BW stage
(BranchWriteback::Load). If the branch has not been resolved
yet, the current instruction remains in the MR stage until the next
cycle.
BranchWriteback class
All of the work for the BranchWriteback class is performed in
the start phase. Up to nine branch instructions may be waiting to
writeback in the writeback stage. In the start phase, the
BranchWriteback class determines if any of the waiting instructions
can writeback by querying the integer unit (IU::CompletedIC).
If so, the registers that have been modified by each instruction are
updated in BranchRegs::Update and the instructions are
retired. Nothing happens during either the do phase or
the end phase.
BPU class
Each of the simulator's processing units has been designed to
encapsulate it, as much as possible, from the other parts of the
simulator. Thus, an outside user of the BPU (IU, Dispatcher, ...)
should have no direct access to the BranchRegs, BranchExecute,
BranchMispredict, or BranchWriteback sub-classes. Access to important
features of the BPU is given indirectly through the BPU class itself.
enum BPU::Reg {CR0, CR1, CR2, CR3, CR4, CR5, CR6, CR7, CR, LR, CTR}
bool BPU::Ready()
bool BPU::Load(Instruction* i)
WORD BPU::ReadReg(Reg reg)
void BPU::WriteReg(Reg reg, WORD value)
void BPU::StartStage()
void BPU::DoStage()
void BPU::EndStage()
void BPU::ReportStats()
These functions are basically wrappers around calls to other functions
of the same (or similar) name within BPU sub-classes. Ready
and Load call BranchExecute::Ready and
BranchExecute::Load, respectively. ReadReg and
WriteReg call BranchReg::ReadIU and
BranchReg::WriteIU. StartStage, DoStage, and
EndStage call each of the functions of the same name in the
BranchWriteBack, BranchExecute, and BranchMispredict classes, in
order. ReportStats calls CurInstruc in each of the
BranchWriteBack, BranchExecute, and BranchMispredict classes and sends
this information to the user interface.
TODO
The simulator's BPU is not yet a completely accurate simulation of the
PPC601's BPU.
Primarily, delayed purge is not implemented. When delayed purge is
working and a branch is predicted as taken, the instructions on the
not-taken path are not immediately purged from the instruction queue
and integer unit. Instead, these instructions are allowed to execute
until the instructions from the taken path enter the instruction queue
from the cache. If the branch is resolved as not taken before the
instructions from the taken path return from the cache, the taken
instructions are flushed, and the processor continues executing the
not-taken path.
Also, communication between the BPU and FPU is lacking. Ideally, when
predicting branches, the BPU should interact with the FPU much as it
does with the integer unit. The FPU should have analogies to
IU::SetPredict, IU::FlushPredict, and
IU::ClearPredict.
The above changes will probably require improvement of the instruction
tagging and register dependency mechanisms. Currently, tags are
primarily used for synchronization of instructions in the BPU and FPU
with instructions in the integer unit. This synchronization
relationship is not mutual, thus there will be problems if, for
instance, the integer unit is ahead of the FPU. A mechanism probably
needs to be implemented to solve this problem and also provide RAW
dependency checking within the pipelines themselves.
Zero cycle BPU dispatch and requests force a lot of the BPU's actions
to occur in the start phase and end phase. However, some of
the code that is currently in the start phase and end phase
does not have to be there and can safely be moved into the do phase
to more closely support the simulator's cycle design.
Floating Point Unit
Return to Design Outline