CS375: Assignment 7 X86 Assembly code. Assigned: Tuesday, April 20th Due: Tuesday, May 4th at 11:59pm Individual assignment. ===-0. Updates ==== Note: * Individual assignment (100 Points) * Extra credit available (see end) ====1. Overview ==== For this assignment you will be writing a CSC to x86 assembly compiler. Your program should accept two command line parameters, the input CSC file name and the output assembly file name. ===1.1 Recommended Approach=== You are free to implement your compiler however you want, but a good approach is to use the standard way most compilers are written. You should write your own set of AST classes (or use the ones from the previous assignment) and have an Visitor visit the root node of the program and generate the X86 code. This way you only need to handle one node at a time. In order to pass the result from one node in the AST to another, you can use the registers or the stack. For instance to compute a = EXP1 + EXP2 you would have the visitor generate the code for the expression which would store the result of the right hand side in variable a. You should recognize that this implies storing the result of the expression in some memory location, so variable a should be assigned some memory. Since you are only dealing with long (int) types all of which have 4 bytes of memory, you need not be concerned about alignment issues. Now, back to computing the result of the above AST. To compute the value of variable a you would invoke the AST node of EXP1, which would save the result of the expression in the stack, and similarly EXP2 would also save it in the stack. The AST node for adding (+) would pop the two values from the stack, and add them and push the result back on the stack. The assignment operator "=" would pop the value off the stack and copy it to the memory location specified for variable "a". In this approach, you essentially generate stack code, using registers as temporary storage for intermediate results during expression evaluation. For extra credit, figure out how to use registers in a more intelligent way to reduce the number of pushes and pops to the stack(see the end of the assignment). ====2. X86 Assembly language ==== You can use gcc to generate assembly language code using the following command: gcc -m32 -S source.c which would generate the assembly language code for source.c into source.s . You can compile the assembly language file source.s using gcc as well: gcc source.s which would produce a.out as the executable. You can modify the assembly code and use gcc to build the executables from the modified assembly as required. The best way to figure out what sequence of assembly language instructions should be generated for each operation is to build small programs and generate the assembly language via gcc for them. Consider the following CSC code: ---------------------------------------------------------------------------------------- #include #define WriteLine() printf("\n"); #define WriteLong(x) printf(" %lld", (long)x); void main() { long a = 25733; WriteLong(a); WriteLine(); } ---------------------------------------------------------------------------------------- The assembly code for the above is as follows, the comments would help you understand the details; ---------------------------------------------------------------------------------------- .file "sample.c" .section .rodata .LC0: #strings for use. .string " %lld" #the printf format specifier. .text #code section .globl main #main is a global label .type main, @function #main is a function main: #main begins here. pushl %ebp #save base-pointer on stack movl %esp, %ebp #copy stack-pointer to base pointer. andl $-16, %esp #align 16-bit subl $32, %esp #allocate space for local variables (1 long) movl $25733, 24(%esp)#move 25733 to ESP+24 movl $0, 28(%esp) #move 0 to ESP+28 movl $.LC0, %ecx #move " %lld" to ECX movl 24(%esp), %eax #move ESP+24 to EAX = 25733 movl 28(%esp), %edx #move ESP+28 to EDX = 0 movl %eax, 4(%esp) #move EAX to ESP+4 = 25733 movl %edx, 8(%esp) #move EDX to ESP+8 =0 movl %ecx, (%esp) #move ECX to ESP = " %lld" call printf #call printf which assumes that ESP contains string movl $10, (%esp) #move 10 to ESP (\n) call putchar #call putchar which prints a blank line. leave #undo saving of registers done at start. ret #pop saved instruction pointer on stack and return. .size main, .-main .ident "GCC: (Ubuntu 4.4.1-4ubuntu9) 4.4.1" .section .note.GNU-stack,"",@progbits ------------------------------------------------------------------------------------------- You should leave in the two macros for WriteLong and Writeline in your CSC files: #include #define WriteLine() printf("\n"); #define WriteLong(x) printf(" %lld", (long)x); since they will help you to debug your program. This can be done by simply having a routine in assembly that prints the contents of say the EAX register. Then to print any variable, all you have to do in assembly is 1) Save the contents of EAX (since they maybe used by some other routine) 2) Save other registers such as stack/base pointer etc. 3) Load the variable into EAX 4) Call the print routine 5) The print routines can be hard coded into your output assembly file meaning you can assume that all the assembly files have those routines. 6) The print routine would do something similar to above example but instead of loading the value into a register, it would already have that in EAX 7) Return from the print routine back to the caller. Similarly for WriteLine. === Notes === * [ATT vs Intel] When looking at code online remember that there are two different ways of writing assembly code, the kind that gcc accepts in the ATT format. The form more popular with assembly programmers is the Intel format. * [Calling Conventions] Though not strictly necessary within your own code, if you want to call or be called by other code, you need to follow the C standard calling convention as described in class. One thing that is omitted in the slides is that the return value (if any) is passed in the register EAX. ====3. Turn-in Instructions ==== Your submission should be in a folder assignment7 which includes the following assignment7/src : all your source code (do not use any packages). assignment7/src/CSCCompiler.java : main source file, which runs your compiler given an input CSC filename and output assembly filename. assignment7/test.sh : a bash shell script that takes in two arguments, first is the input C file name and the second is the output assembly file name assignment7/README : readme file, containing your NAME, CSID, UTEID, and comments. assginment7/examples/1.c : non-trivial example CSC file that you tested your code on assginment7/examples/2.c : a second non-trivial example CSC file. The example files are there so i have a larger set of input files to test your code on. If you do not submit these, points will be deducted. Please verify that the assignment7 directory contains the required files (in particular a README file). You can submit your assignment by executing the following command: turnin --submit rashid assignment7 assignment7 Your program should run via the command: java CSCCompiler input.c output.s and the output assembly of your compiler should compile via GCC: gcc output.s ./a.out ====4. Extra Credit==== You can claim extra credit by performing one or more dataflow optimizations on the generated assembly code. These include, but are not limited to : - Constant propagation - Dead Statememnt Elimination - Reducing the number of moves you have to make by saving/loading results from/to the appropriate register. In order to claim credit for these optimizations, you need to submit the following: - Description of the type of optimization in your readme. - Sepcific examples where you tested your code. - Some measure of how much optimization was performed (no. of constants, no. of statemenets eliminated, no. of moves saved etc.)