Write a parser for the Pascal language (as defined by Jensen and Wirth, 2nd ed.), producing intermediate code trees as output. Your parser does not have to handle the following Pascal constructs: with ... do, packed, file of, set of, case; in general, if your parser legitimately handles the test programs, it will satisfy this assignment. Use one of the lexical analyzers that you wrote in the first two assignments as the lexical analyzer for this assignment. You may modify the lexical analyzer slightly if needed.
You may write the parser in one of the following ways (your choice):
The file pars1.y contains a small Yacc program for a Pascal subset; the files pars1c.c and pars1.lsp contain equivalent programs using operator precedence and recursive descent. You may use one of these files as the starting point for your program.
Yacc is described in the textbook in Section 4.9 . There is a man description of yacc on-line. Several books on Lex and Yacc are available. Yacc allows the syntax of a language to be described using context-free expressions. The Yacc compiler compiles these to produce tables to drive an LALR parser to parse the language. When a complete phrase has been found, a user-specified set of actions (written in C) is executed to produce the desired output, with special variable forms such as $1 etc. used to refer to the results associated with the symbols in the grammar production. Yacc is designed to use Lex by calling a routine yylex to get the next token and by getting the value of the token through the variable yylval.
If you wish, you can use your scanner program (rather than Lex program) with Yacc by: (1) renaming it yylex; (2) setting the variable yylval to the token value; and (3) returning as the value of yylex the operator number added to OPERATOR_BIAS, etc.
Routines for handling a symbol table and for pretty-printing intermediate code trees are provided in the files symtab.c and pprint.c; for Lisp, the files symtab.lsp and tokendefs.lsp contain similar functions. The files token.txt and symtab.txt describe token and symbol table entries.
The form of intermediate code will be abstract syntax trees using the same tokens as in the first two assignments. The file token.h provides constant values needed for Yacc or for a parser written in C; tokendefs.lsp has definitions for Lisp. Operators fix and float are defined to convert between fixed point (integer) and floating point; these will be generated by the compiler when needed. See the file pprint.c or tokendefs.lsp for a complete list of operator codes. File trivb.tree shows the tree structure for trivb.pas.
In addition to expression trees representing assignment statements, we will use trees with additional operators to represent other intermediate code statements:
|statement list:||(progn statement1 ... statementn )|
|conditional branch:||(if exp   statement1   statement2 ) or (if exp   statement1 )|
|goto:||(goto n )|
|label:||(label n )|
|function call:||(funcall fn   arg1 ... argn )|
|array reference:||(aref base   offset )|
|program:||(program name (progn args ) code )|
Only these forms are used in the intermediate code; other constructs, such as for loops, must be converted to these basic forms by the parser. The following is an example of output for graph1.pas; variations such as extra nested progn groups are okay. White space does not matter, but level of parenthesis nesting (shown by vertical alignment) does matter.
(program graph1 (progn output) (progn (:= i 0) (label 1) (if (<= i 32) (progn (:= x (* 6.250000e-02 (float i))) (:= y (* (funcall exp (- x)) (funcall sin (* 6.283180e+00 x)))) (:= n (fix (+ (funcall round (* 3.200000e+01 y)) 3.400000e+01))) (progn (label 0) (funcall write ' ') (:= n (- n 1)) (if (= n 0) (progn) (goto 0))) (funcall writeln '*') (:= i (+ i 1)) (goto 1)))))
Test your program on the files trivb.pas, graph1.pas, and pasrec.pas . graph1.pas is taken from the Jensen and Wirth book, while pasrec.pas exercises pointer, record, and array operations; we will compile them to machine code and execute them.