INSTRUCTION UNIT


 -------
|LOADER|-------------------------------------------------------------------------  
 -------									|
     |										|
     |			   							V
     |                        ---------                                    ---------
     ----------------------->| FETCHER |---------------------------------->| Cache |
      ----------------------> ---------                                    ---------
     |				  ^						 ^
     |				  |						 |
     |			   -----------------    			         |
     | 			  |      q 7        |					 |
     |			   -----------------					 |
     |			  |       q 6        |					 |
     |			   -----------------					 |
     |			  |       q 5        |					 |
     |			   -----------------					 |
     |			  |       q 4        |					 |
     |		           ------------------ ------				 |
     |		     	  |       q 3        |	    |				 |
     |		    	   -----------------	    |				 |
     |		    	  |       q 2        |	    |				 |
     |          	   ----------------- 	    |<------| 	                 |
     |		    	  |       q 1        |	    |	    |			 |
     |		    	   ----------------- 	    |	    |		  	 |
     |		    	  |       q 0        |	    |	    |			 |
     |		           -----------------  ------	    |			 |
     |			         |       |		    |			 |
     |				 |	 |	 	    |			 |
     |				 |	 |  		    |			 |
     |				 V	 V		    |			 |
     |			    --------------	            |			 |
     |		           |  DISPATCHER |\------------------	                 |
     |			    -------------- \    	    			 |
     |			   /	     \      \   			         |
     |		          /           \      \  			         |
     |                   V             V      --V                                |
     |                 -----         -----      -----                            |
     |		      | BPU |       | FPU |    | IU  |                           |
     |                 -----         -----      -----                            |
     |                   |             |          |                              |
     |                   |             |          |                              |
     |		         |             | 	  |                              |
      ---------------------------------           --------------------------------

Instruction Unit Diagram


The instruction unit consists of three main parts: Instruction Queue (IQ), Fetcher, and Dispatcher. Instructions are four bytes and word-aligned. Bits 0-5 always specify the primary opcode. Many instructions also have a secondary opcode. The remaining bits of the instruction contain one or more fields for the different instruction formats.

Fetcher contains an adder to compute the address of the next sequential instruction based on the last instruction fetched. As soon as Fetcher gets the next instruction, it will increment the base address by four (because an instruction address is four bytes long). However if Fetcher gets the instruction from the BPU (branch processing unit), the base address is replaced by the new instruction address from the BPU. Fetcher can fetch up to eight instructions in a clock cycle. Fetcher is interacting with the loader, BPU, IQ, and cache unit. Every instruction request must go through Fetcher. Initially, Fetcher will get the request from the loader to get the first address of the instruction and then send the instruction address to the cache unit. Loader is the one that initially put the program in the memory location so it knows the starting address of the program. Before Fetcher sends a request to the cache, Fetcher always checks whether the IQ is full or not. If it is full then Fetcher will not request any instructions. After sending the initial address to the cache, cache will then put the instruction into the IQ. Every requested instruction from the fetcher goes to the cache. Cache serves the requests based on the priority rule. Requests from the instruction unit have the lowest priority. If the cache is busy servicing a request from another unit, the instrution unit must wait or it can dispatch another instruction while waiting.

IQ consists of eight segments (a cache block size) where each segment is four bytes long. Q0 - Q3 are used by the dispatcher to search for BPU and FPU (floating point unit) instructions. BPU and FPU can be dispatched in zero and one clock cycle respectively. Q4 - Q7 provides buffering to reduce the frequency of cache accesses. The Dequeue procedure throws away the used instructions in the queue and bumps down the unused instructions in order starting from Q0. The Dispatcher will only look in Q0 for integer instructions to dispatch. If Q0 is an integer instruction then dispatcher will send it to IU and IU will decode it the same cycle. After that, Q0 will be emptied and the queue will move everything down.

Instructions are dispatched to the appropriate execution units by the dispatcher. Branch and Floating Point Instructions can be dispatched out of order from any of Q0 - Q3. Integer Instructions must be dispatched only from Q0, thereby maintaining a consistent order for the writeback of registers later. A total of three instructions can be dispatched in one clock cycle. Branch and Integer instructions are dipatched in zero cycles and Floating Point Instructions take one cycle to dispatch. Some instructions get dispatched to both the FPU and IU. These are float loads and stores and the special case of fcmp (floating point compare) instructions as well. For the float stores, when the dispatcher finds itself with an stfs, it will dispatch it to the FPU, and mark it FPU_DISPATCHED. Then it treats the instruction as if it were an integer type. In order to decode the effective address of the destination operand, it must go to IQ0 to be dispatched to the IU. Fcmp is similar, in that it gets dispatched to the FPU, but after it gets dispatched, it gets removed from the queue. A bubble then gets created and put into the same place in the queue. This bubble is treated as a no-op. The reason for this is that when this bubble reaches the complete stage of the IU pipeline, it must stall the IU until the fcmp in th FPU is ready to writeback the new value of the CR field it is modifying. Once it has written the new value back, the nop is allowed to exit the IU and processing continues.

In one clock cycle, there are four phases executed by Fetcher and Dispatcher. These phases are implemented to represent the division of work executed at the start, middle, and end of the clock cycle. These four phases are: start phase, do phase, end phase, and report statuw phase.


Tag Design Document


Memory Design

Back to Design Outline