CS 380C: Programming Assignment 4

      Assignment: Register allocation for PowerPC processor
      Instructor: Kathryn S. McKinley
      Assigned: Monday, October 26, 2009
      Due: Friday, November 20, 2009, 5PM

    Objective

    The goal of this assignment is to implement the register allocation algorithm for the PowerPC processor on Linux.

    1. Basic register allocation based on graph coloring or linear scan algorithm. (100%)
    2. Extra Credit: Rematerialization. (15%)

    Project Description

    You will implement a register allocation algorithm in your compiler. As in prior assigments, your compiler will call csc compiler on subset C files. Then, your register allocator will take the 3-address format program as the input, and perform register allocation for the PowerPC processor. It will generate an intermediate format called RA3 code (register allocated 3-address format code) as the output. Then, you will translate the code in RA3 from to PowerPC assembly code by calling a translator that we provide. You need to test your code on PowerPC machine, and generate an analysis report based on the results.

    Getting Started On Lab1

    1. Download the c-subset compiler and example C files at http://www.cs.utexas.edu/users/mckinley/380C/labs/cs380c_lab4.tgz
    2. Decompress the file on the cs unix machines using
      	"tar xvfz cs380c_lab4.tgz".
      
      This command will produce the following directory structure
      	cs380c_lab1
      		3addr_to_ppc
      		examples
      		lab4
      		src
      
      The source files of the 3-addr to PowerPC translator are in the 3addr_to_ppc directory. Here is an example call to the translator:
      cd 3addr_to_ppc; run.sh -backend=ppc <../example/s4.3addr.ra3
      The example files of the RA3 form code are in the example directory. Among them, s4.c is the original C source file, s4.3addr is the 3-addresss form code, s4.3addr.ra3 is the RA3 form code, and s4.s is the translated code for the PowerPC machine.
    3. Implement your own new register allocation component in the lab4 directory. As usual, your compiler should accept 3-address code as input from stdin, and write output to stdout. Your compiler back end should generate the following output:
      1. RA3 format code with the specified number of registers registers.
      2. An optimization report
      Your compiler will be invoked by the script 'run.sh', and should accept the following command line arguments.
      1. -opt, a comma separated list of transformations. The transformations you will support are:
        • rga,x: perform basic register allocation with x available registers, where x is a number in the set: (4, 6, 8, 10, 12, 14, 16).
        • rem: specifies perform rematerialization.
        For example:
        • -opt=rga,16 means perform basic register allocation with 16 registers
        • -opt=rga,16,rem means perform basic register allocation and rematerialization with 16 available registers
      2. -backend specifies which backend to use: ra3 or rep.

    Implementation Specification

    Perform register allocation from the source of your choice. You will generate the RA3 format code, which is an extension of the 3-address format.

    1. RA3: Register allocated 3-address format

      RA3 formatis very similar to 3-address. We list the differences below.
      1. A new operand type, register, is defined in the RA3 format. It ends with "~". For example, "r20~" is a legal register operand. r20~ specifies that the operand is in register 20. (See the PPC register table below for valid register names.) When storing the result of an instruction to a register, generate the register operand at the beginning of the instruction as follows:
          "r23~instr 15: mul r20~ 10."
          This instruction has two operands, the 1st is the register r20, the 2nd is a constant, the result is stored in register r23.
      2. The following instructions access the global variable.
        A: Load global variable:
        	instr 6: add res_base#32756 GP
        	r19~instr 7: load res_base#32756
        
        You only need to allocate one register for the results of the 2nd instruction. Please notice the format of the 2nd instruction is different from the 3-address format.
        
        			B: Store global variable:
        
        	instr 4: add res_base#32756 GP
        	instr 5: store 18  res_base#32756
        
        This instruction stores a constant 18 in a global variable with the address 32756.
        
        
        			If a register is stored, the instructions are:
        
        	instr 4: add res_base#32756 GP
        	instr 5: store r18~  res_base#32756
        
        This instruction stores the contents of register r18 in a global variable with the address 32756.
        
        			
      3. The local variables, such as x#-4, y#-8, are on the stack after any function is called. Operands in this format will access memory, rather than using registers.
      4. Only load/store instructions can access local variables directly.
      5. All other instructions must have register or constant operands. The only exception is the address for the global variables: the add instruction, that is computing the address, accesses the GP and offset directly.
      6. The WriteLong() function can print the value of register operand. It adds a \n at the end. You must not call WriteLine() any more.
      7. The max number of the parameters for each function is 8.
      8. Array and Structure are not supported.

      The example of a 3-address format code and its RA3 format code are listed below:


      C source code:
      	long res;
      	void main()
      	{
      	    long x, y, z;
      	    res = 8;
      	    x =res;
      	    x =0;
      	    y =1;
      	    z =2;
      	 
      	    y = x *10;
      	    z = y + 1000;
      	    if (z<=2000) {
      	                    z = z + 5;
      	    }
      	    res = z;
      	    WriteLong(z);
      	    //printf("%d \n", z);
      	}
      
      3-address format code:
      	instr 1: nop
      	instr 2: entrypc
      	instr 3: enter 12
      	instr 4: add res_base#32764 GP
      	instr 5: store 8 (4)
      	instr 6: add res_base#32764 GP
      	instr 7: load (6)
      	instr 8: move (7) x#-4
      	instr 9: move 0 x#-4
      	instr 10: move 1 y#-8
      	instr 11: move 2 z#-12
      	instr 12: mul x#-4 10
      	instr 13: move (12) y#-8
      	instr 14: add y#-8 1000
      	instr 15: move (14) z#-12
      	instr 16: cmple z#-12 2000
      	instr 17: blbc (16) [20]
      	instr 18: add z#-12 5
      	instr 19: move (18) z#-12
      	instr 20: add res_base#32764 GP
      	instr 21: store z#-12 (20)
      	instr 22: write z#-12
      	instr 23: ret 0
      	instr 24: nop
      
      RA3 format code:
      	instr 1: nop
      	instr 2: entrypc
      	instr 3: enter 12
      	instr 4: add res_base#32764 GP
      	instr 5: store 8  res_base#32764
      	instr 6: add res_base#32764 GP
      	r19~instr 7: load res_base#32764
      	instr 8: store r19~ x#-4
      	instr 9: store 0 x#-4
      	instr 10: store 1 y#-8
      	instr 11: store 2 z#-12
      	r20~instr 12: load x#-4
      	r21~instr 13: load y#-8
      	r22~instr 14: load z#-12
      	r23~instr 15: mul r20~ 10
      	instr 16: move r23~ r21~
      	r24~instr 17: add r21~ 1000
      	instr 19: move r24~ r22~
      	instr 20: store r21~ y#-8
      	instr 21: store r22~ z#-12
      	instr 23: cmplt r22~ 2000
      	instr 24: blbc (23) [27]
      	r22~instr 25: add r22~ 5
      	instr 26: store r22~ z#-12
      	instr 27: add res_base#32764 GP
      	instr 28: store r22~  res_base#32764
      	instr 29: write r22~
      	instr 30: ret 0
      	instr 31: nop
      
    2. Usage Constraints for the PPC registers
      The PowerPC architecture provides 32 general purpose registers, each 32 bits wide. Here are general purpose registers and their usage.
      Register Name Usage
      r0 Volatile register which may be modified during function linkage
      r1 Stack frame pointer, always valid
      r2 System-reserved register
      r3-r4 Volatile registers used for parameter passing and return values
      r5-r10 Volatile registers used for parameter passing
      r11-r12 Volatile registers which may be modified during function linkage
      r13 Small data area pointer register
      r14-r30 Registers used for local variables Only these numbers are available to your register allocator.
      r31 Used for local variables or "environment pointers"
      CR0-CR7 Condition Register Fields, each 4 bits wide
      LR Link Register

      Since we provide the translation from RA3 format into PPC, you will not implement the ABI component (i.e., the calling convention that saves and restores registers on a procedure call and return). However, you need to notice:
      1. Only registers r14-r30 are available for local variables that you will allocate.
      2. r1 is the stack frame pointer.
      3. All data is 32 bits.
    3. Report format
      You will run your code on a CS linux machine, and generate the ra3 code. when "-backend=rep" is speficied, please generate the report in the form as below, note, the number of resisters is specified by "-opt =rga"

      C File: testcase1.c
      <Graph coloring/Linear scan> 
      Register number: 4, Number of Spills: 20
      
      Then, You will translate the ra3 code into ppc assembly code by calling the translator provided on a CS Linux machine. You will test your assebmly code on a PPC machine (we will send you account information shortly). You will report the compilation and execution time. Please report the execution time with variant number of registers as below, note, you need to generate this report manually, and write it into a file, named testcase1.ppc.rep.

      C File: testcase1.c
      <Graph coloring/Linear scan> Compile time: .15s
      Register number: 4, Execution time: 0.12s
      Register number: 6, Execution time: 0.08s
      Register number: 8, Execution time: 0.04s
      ... ...
      Register number: 16, Execution time: 0.02s
      

    Turning In Your Assignment

    Your assignment should contain the following.

    1. A single tgz file named lab4.tgz, which, when extracted, creates the lab4 directory.
    2. The lab4 directory can contain sub-directories.
    3. The lab4 directory should contain the following files:
      1. README - Please include your name(s) and UTEID(s) here.
      2. compile.sh - a script to compile your source code.
      3. run.sh - a script that runs your compiler. This script should read 3-address code as input from stdin and write output to stdout. The output is specified by the command line arguments described in the previous section.

    Turn in your assignment by running the following commands on a UTCS Linux machine.

    	$ # Go the parent directory of the lab1 directory.
    	$ tgz lab4 lab4/
    	$ turnin --submit hadi cs380c-lab4 lab4.tgz
    	$ turnin --list hadi cs380c-lab4