CS375: Assignment 4 Recursive descent parser and SaM code generator. Assigned: Tuesday, February 16th, 2010 Due: Tuesday, February 23th at 11:59pm Updated: Friday, February 19th == 0. Updates == 1. There were some typos in the grammar. All the expressions are parenthesized, so ( is actually '('. It has been fixed now. 2. You do not need to worry about redeclaration of variables. Since all local variables are defined at the begining of a method, you do not need to worry about scoping rules for variables. 3. There will be a main method in the input program. Your parser should not depend on this, rather this is how i would be testing your parsers. 4. There is no overloading of methods. 5. You would need a symbol table for each method as pointed out in class. One approach would be to have a separate class for the symbol table (using hash tables or any approach). A symbol table object would be created inside your getMethod method, and be initialized by the getDeclarations method call. Once initialized it would be passed to all (almost) other method invokations inside the getMethod to make sure each rule has the appropriate information. Each method would have its own symbol table. == 1. Bali Compiler [100 points] == Create a handwritten recursive-descent parser and SaM code generator for the Bali language, using the SaMTokenizer for a lexical analyzer. Your compiler should take a Bali2 program file as input and produce a SaM program that executes the Bali2 program. === 1.1 Grammar === The following is the grammar specification of the Bali language. In the grammar specification, all lower-case symbols denote a literal value. Additionally, these literals are reserved words (keywords) and can not be used as identifiers for variables or methods. Non-alphanumeric characters surrounded by single quotes denote the literal consisting of only the non-alphanumeric characters. Upper-case symbols are non-terminals. '*' means zero or more occurrences. '?' means one or zero occurrences. '[ ]' is the character class construction operator. Parentheses are used to group sequences of symbols together. A Bali program is a sequence of zero or more method declarations. The only type in this language is int. Each method declaration has a return type, zero or more formals, and a body. The body consists of zero or more variable declarations, and a sequence of statements. Variables can be initialized when they are declared. The method body consists of a sequence of statements, where each statement is an assignment statement, a conditional statement, a while loop, a return statement, a break statement, a block, or a null statement. These statements have the usual meaning; a break statement must be lexically nested within one or more loops, and when it is executed, it terminates the execution of the innermost loop in which it is nested. Expressions are fully parenthesized to avoid problems with associativity and precedence. The literal 'true' is the value 1. The literal 'false' is the value 0. For the purposes of expressions used in conditions, any non-zero value is true and the value zero is false. Characters between and including '//' and the end of the line are interpreted as a comment and should be discarded. *************************************************************** PROGRAM -> METH_DECL* METH_DECL -> TYPE ID '(' FORMALS? ')' BODY FORMALS -> TYPE ID (',' TYPE ID)* TYPE -> int BODY -> '{' VAR_DECL* STMT* '}' VAR_DECL -> TYPE ID ('=' EXP)? (',' ID ('=' EXP)?)* ';' STMT -> ASSIGN ';' | return EXP ';' | if '(' EXP ')' STMT else STMT | while '(' EXP ')' STMT | break ';' | BLOCK | ';' BLOCK -> '{' STMT* '}' ASSIGN -> LOCATION '=' EXP LOCATION -> ID METHOD -> ID EXP -> LOCATION | LITERAL | METHOD '(' ACTUALS? ')' | '('EXP '+' EXP')' | '('EXP '-' EXP')' | '('EXP '*' EXP')' | '('EXP '/' EXP')' | '('EXP '&' EXP')' | '('EXP '|' EXP')' | '('EXP '<' EXP')' | '('EXP '>' EXP')' | '('EXP '=' EXP')' | '(''-' EXP')' | '(''!' EXP')' | '(' EXP ')' ACTUALS -> EXP '('',' EXP')'* LITERAL -> INT | true | false INT -> '-'? [1-9] [0-9]* ID -> [a-zA-Z] ( [a-zA-Z] | [0-9] | '_' )* If a program does not satisfy the grammar above or does not satisfy the textual description of the language, your compiler should print a short, informative error message and/or exit with a non-zero exit status. === 1.3 Template === You can use the template below to help you get started. package assignment4; public class BaliCompiler { static String compiler(String fileName) { //returns SaM code for program in file try { SamTokenizer f = new SamTokenizer (fileName); String pgm = getProgram(f); return pgm; } catch (Exception e) { System.out.println("Fatal error: could not compile program"); return "STOP\n"; } } static String getProgram(SamTokenizer f) { try { String pgm=""; while(f.peekAtKind()!=Tokenizer.EOF) { pgm+= getMethod(f); } return pgm; } catch(Exception e) { System.out.println("Fatal error: could not compile program"); return "STOP\n"; } } static String getMethod(SamTokenizer f) { //TODO: add code to convert a method declaration to SaM code. //Since the only data type is an int, you can safely check for int //in the tokenizer. //TODO: add appropriate exception handlers to generate useful error msgs. f.check("int"); //must match at begining String methodName = f.getString(); f.check ("("); // must be an opening parenthesis String formals = getFormals(f); f.check(")"); // must be an closing parenthesis //You would need to read in formals if any //And then have calls to getDeclarations and getStatements. } static String getExp(SamTokenizer f) { switch (f.peekAtKind()) { case INTEGER: //E -> integer return "PUSHIMM " + f.getInt() + "\n"; case OPERATOR: { } default: return "ERROR\n"; } } } === 1.4 Logistics === Make sure that your compiler is in the java class assignment4.BaliCompiler. Your compiler should take two command-line arguments. The first argument is an input file containing a Bali program. The second argument is an output file that will contain your generated SaM code. == 2. Turn-in Instructions == Assignment submission will be done electronically using the turnin program. First create the following directory structure in your current directory: assignment4/ assignment4/README - Contains the students who worked on this assignment assignment4/assignment4.jar - Compiled code for Problem 1 assignment4/src/.../*.java - The source for all the code in assignment4.jar Please verify that the assignment4 directory contains the required files (in particular a README file). You can submit your assignment by executing the following command: turnin --submit rashid assignment4 assignment4 If you worked on this assignment with a partner, only one person needs to submit the assignment (but please remember to include your partner's name in the README file).