Write a lexical analyzer for Pascal. The program may be written in Lisp or in C.
Input to the Lexical Analyzer is obtained by calling functions named getchar and peekchar (with no arguments). getchar returns the next character from the input and moves the character pointer; peekchar returns the next character without moving the pointer. peek2char returns the second character without moving the pointer. These functions are provided.
The Lexical Analyzer is called as the function gettoken (with no arguments); its output should be one token. A token record contains, among other things, the following fields:
Blanks, tabs, ends of lines, and comments are considered to be separators; the Lexical Analyzer consumes (skips over) these, but does not return anything for them. Comments are contained between the characters { and } , or between (* and *) . A comment may not contain the terminating character or character pair; comments cannot be nested.
Operators are as follows: + - * / := = <> < <= >= > ^ .
The following Reserved Words are treated as Operators: and or not div mod in
Delimiters are: , ; : ( ) [ ] ..
The result returned for an Operator or Delimiter is:
| tokentype: | OPERATOR (0) or DELIMITER (1) |
| whichval: | integer denoting which operator (1..19) or delimiter (1..8). |
array downto function of repeat until begin else goto packed set var case end if procedure then while const file label program to with do for nil record typeThe result returned for a Reserved Word is:
| tokentype: | RESERVED (2) |
| whichval: | integer denoting which reserved word (1..29) |
The result returned for an Identifier is:
| tokentype: | IDENTIFIERTOK (3) |
| stringval: | string (identifier name) |
The result returned for a String is:
| tokentype: | STRINGTOK (4) |
| stringval: | string |
Unsigned Numbers must begin with a digit; if there is a decimal point, it must be followed by at least one digit. A number may be followed by a signed decimal exponent, in which case it is floating-point (whether there is a decimal point or not).
The lexical analyzer must convert a number to internal numeric form. Care must be taken to ensure that the result is numerically accurate and that errors such as numeric overflow are detected. Challenging examples are included in the test data; correct handling of these examples will be a grading criterion. If there is an error, your program should print an error message and return a number token of the correct type (integer or real); the value of the number will not matter.
The result returned for a Number is:
| tokentype: | NUMBERTOK (5) |
| datatype: | integer denoting type: INTEGER or REAL |
| tokenval: | numeric value in internal form. For C, use intval and realval |
Use the C convention of terminating a string with a '\0' character.
You must perform scanning at the character level; you are not allowed to use C library functions that do the work of the lexical analyzer, such as sscanf. However, you may use standard string library functions such as strcmp.
(read-file "/u/cs375/graph1.pas")After calling read-file, the call (test-scanner) will test your program on the file. The file tokendefs.lsp contains macro definitions for tokens and the functions talloc (which makes a token structure) and printtoken.
Character constants are written with a #\ prefix, e.g. the character * is written #\* ; the blank space is written #\Space .
Useful Common Lisp functions: char-code, make-string, char-downcase, char=, string=.
You must perform scanning at the character level; you are not allowed to use Lisp functions that do the work of the lexical analyzer, such as intern or read-from-string.