Homework 6: Loop transformations, Cache optimizations Course: CS 380C: Advanced Compiler Techniques (Fall 2007) Instructor: Keshav Pingali Assigned: Tuesday, November, 20, 2007 Due: Monday, December 3, 2007, 11:59:59 PM You can do this assignment in groups of two. Turn your assignment on time. Turnin will be disabled soon after the deadline. 1. Objective ------------ The objective of this assignment is to perform optimizations that exploit the memory hierarchy. This assignment involves 1. Reading in, transforming and writing out an Abstract Syntax Tree (AST) 2. Performing loop tiling on the AST 3. Identifying the optimal tile-size on a given machine 2. AST format ------------- Our C-subset compiler now comes with an AST reader (ast-to-c) and writer (c-to-ast). The format of the AST is briefly explained here with an example. The authoritative reference is the code in the file 'cst.h' that contains logic to read and write an AST. Consider the following AST output 195 1 56 2 8 0 0 "" 2 42 3 4 0 0 "" 3 38 0 0 0 0 "long" 4 41 5 6 0 0 "" 5 40 0 0 0 0 "a" ... The first line gives the number of nodes in the AST. You can use this to pre-allocate AST nodes, if needed. Starting with the second line, each line represents a single AST node. The fields in each line are: field 0: The AST node number, or id. field 1: A number representing the type of the node. These numbers correspond to the enum values (enum variable names starting with AST) in the file cst.c. fields 2,3,4: The numbers/ids of the first, second and third children, respectively, of this node. A value of zero indicates that this field is not used. field 5: An integer value for this node, if needed. This is used only for long constants. field 6: A string value for this node, if needed. This is used for identifiers, typenames, etc. You may choose to write your own parser to read and write ASTs, or just use the code provided (in ast-to-c.c and c-to-ast.c). You should be able to identify function bodies, loop bodies, etc. in the AST. All transformations in this assignment will be performed on the AST. You should read in the AST, perform your transformations, write out an AST. You can then pass this AST to 'ast-to-c' to obtain transformed C-code. ast-to-c reads an AST from standard input and writes C code to standard output. c-to-ast accepts a filename as a commandline argument and writes an AST to standard output. For example, they can be invoked as $ ./c-to-ast ../examples/gcd.c > AST.txt $ ./ast-to-c < AST.txt > new-gcd.c 3. Performing loop tiling on the AST ------------------------------------ Loop tiling is a loop transformation that consists of two parts --- strip-mining, and loop interchange. This involves identifying nested loops in the AST and performing these transformations. It is always legal to perform strip-mining. Strip mining might involve writing cleanup code based on the strip size chosen. Loop interchange is not always legal, and its legality depends on the dependences between iterations of the loop. To compute dependences and check for legality, you will be using Omega calculator. The input to Omega is a set of integer linear inequalities in almost the same format as presented in class. Omega is installed in UTCS and on ham.cres (the PowerPC machine) in the following location. UTCS: Omega calculator: /p/graft/omega/omega_calc/obj/oc Sample input files: /p/graft/omega/omega_calc/test_parser Documentation: /p/graft/omega/omega_calc/doc/calculator.dvi ham.csres: Omega calculator: /opt/omega/omega_calc/obj/oc Sample input files: /opt/omega/omega_calc/test_parser Documentation: /opt/omega/omega_calc/doc/calculator.dvi Omega reads input from stdin and writes to stdout. You may also use the Omega library and link to it directly from your program. Omega library is installed in the following directories: UTCS: /p/graft/omega/omega_lib ham.csres: /opt/omega/omega_lib 4. Identifying the optimal tile size on a given machine ------------------------------------------------------- Loop tiling is a transformation that tries to best utilize the memory hierarchy. The optimum tile size depends on the cache configuration of the particular cache level, being optimized for. In this assignment your transformation should accept tile size as a parameter. You should optimize for level 2 cache, find the right tile size and pass this to the compiler. Finding the right tile size is orthogonal to actually performing the transformation. For this assignment, find the tile size by performing a simple iterative search. Set the tile size to various powers of 2 (2, 4, 8, 16, ...) and figure the tile size that gives the optimal performance. 5. Workflow ----------- Your code for this assignment should read AST as input, perform loop tiling (with tile size as a parameter), and produce AST code as output. Use 'ast-to-c' to see how the transformed code looks. Also, compile the transformed code using gcc at optimization level O3, and measure execution time on ham.csres. Keep in mind to do multiple runs (atleast 5) and take the median value for execution time. Please start working on this assignment early, to avoid running experiments on ham.csres at the same time as other students. 6. Output --------- You will be provided with three simple loop kernels. These programs are available in the source tarball. Do the assignment and produce as verbose an output as possible. For each of the loops, you should provide (in the tarball that you turn in) the transformed AST for the optimal tile size, the transformed C-code, execution times for the various tile sizes, graphs of execution time versus tile size, and complete details of your experiments (so that someone else can reproduce them). As mentioned earlier, be as verbose as possible. 7. Turning in your assignment ----------------------------- Download this tarball. http://www.cs.utexas.edu/users/pingali/CS380C/2007fa/assignments/assignment6/assignment6.tar.gz This is organized similar to previous homeworks. Your assignment should contain the following: 1. A single tar.gz file named hw6.tar.gz, which, when extracted, creates directory hw6. 2. The hw6 directory can contain sub-directories. 3. The hw6 directory should contain a README file. Please include your name(s) and UTEID(s) here. In addition, it should contain all information (or pointers to them) mentioned in the previous section. Also turn in your code (though the TA will not be running them). The hw6 directory already exists with these files in the tarball you downloaded. Turn in your assignment by running the following commands on a UTCS Linux machine. $ # Go the parent directory of the hw6 directory. $ tar -zcvf hw6.tar.gz hw6 $ turnin --submit suriya cs380c-hw6 hw6.tar.gz $ turnin --list suriya cs380c-hw6 Please use turnin to submit your assignment. Only homeworks that are turned in using the procedure described above will be accepted. 8. Tips ------- 0. Start early :) 1. Watch the clarifications page http://www.cs.utexas.edu/users/pingali/CS380C/2007fa/clarifications.html 2. Suriya will conduct a tutorial session on Wednesday, November 21st, 3-4, instead of the regular office hours. The location is RAS 310.