|atn.clj||Augmented Transition Net|
|gram.clj||Simple Example Grammar and Small Database for Orders|
|restgram.clj||Starter Restaurant Grammar|
|physgram.clj||Starter Physics Grammar|
|restdbsmall.clj||Smaller Restaurant Database|
|smalllex.clj||Small Restaurant Lexicon|
|restqueries.clj||Example Restaurant Queries|
|restaurant.clj||Loads restaurant files|
|physics.clj||Loads physics files|
Computer interfaces based on natural languages such as English can be used by non-programmers with little or no training. Such an interface can easily be constructed provided that the English coverage is limited to a narrow domain; this requirement is satisfied by database systems and by many application areas.
For this exercise, you are to construct a natural language interface using the semantic grammar technique. A semantic grammar differs from a general English grammar in that phrases in the grammar have meaning in the domain rather than being purely syntactic descriptions of English phrases. For example, a general English grammar would have categories such as "noun phrase", while a semantic grammar would have categories such as "date" or "part-name".
One thing that needs to be handled in the parser is the scanner pointer that points to the current position in the sentence. The position must be moved as the sentence is read, but it must be saved at strategic points to allow for backup in case of a parsing error. A set of simple functions to do this has been provided in the file atn.clj .
Lexicon entries have the form:
(category (word ... (word semantics) ...) )Each of the listed words is a member of the category. It is okay to have redundancies, slang, or abbreviations (these are good because they are usually short and unambiguous.) If a word is specified as a list of two items, the first item is the word and the second item is the semantics or internal form, e.g. (february 2) or (shoes shoe) to reduce a plural form to the singular form.
Parsing programs using the ATN functions can be written by hand, but there is a grammar compiler gramcom.clj that makes things easier. Enter (gramcom grammar) to compile each time you change your grammar.
A grammar rule has the form:
(nonterm -> (right-hand side items) semantics)nonterm is a nonterminal symbol that is the left-hand side of the production; the rule says that the left-hand side nonterminal can be composed of the sequence of items on the right-hand side.
The allowable items on the right-hand side are:
|word||exactly the specified word|
|(nonterminal)||a grammar nonterminal, like a subroutine call to a sub-grammar.|
|(category)||a member of the lexical category, e.g. (month)|
|?||preceding item is optional|
|(separate ? from a word by a space)|
The semantics is clojure code to be executed when the grammar rule is satisfied. The right-hand side items are available as variables $1, $2, $3, etc., similar to what is done in Yacc. For example, consider the rule:
(loc -> (in (city)) (restrict 'city $2))In this case, $2 refers to whatever matches the (city) part of the grammar rule, so that an input in berkeley would restrict the database records to those where the city field is berkeley.
If you use (def tracegram true), the grammar compiler will compile code to trace each grammar rule; this can help debugging.
There are two options for the application used for this assignment:
The database to be used is provided in the file restdb.clj; this is a version of the same database used in restaurant query systems by Prof. Mooney. A good first step in developing your program is to write a grammar that will describe the legal sentences that may be used in querying this database. The grammar might be in the form of productions or an ATN. Note that the grammar only needs to represent sensible questions about this one particular database. Your grammar should meet two goals: avoidance of ambiguity by accepting only the "correct" parsing of any phrase, and generality in the sense that a variety of database queries can be handled by the grammar.
An example of a restaurant query is:
(ask '(where can i get american food in berkeley))The file restqueries.clj gives some sample queries; you do not need to implement all of them, but you can use these queries for ideas for extending the grammar. The first 30 sample queries is enough for this assignment. Some natural language systems produce a parse tree as output; in this case, the desired output is a database query. The output from parsing each phrase will in general be a part of a query; by collecting the parts of the query, the parsing algorithm can output a completed query that can be executed by the database system. Use the supplied database system and database in the file restdb.clj to answer the questions; the file restlex.clj provides lexicon definitions. Note that phrases in the English question will in general do one of several things to the database query:
(postpr '(fn (quote $$)))where fn is the function to be called and $$ represents the answer to the query. For example, to answer the question How many ... the fn could be length. To answer ... the best ... , you could write a function to find the best-rated restaurant from the list of restaurants that were retrieved.
A grammar compiler, gramcom.clj, is provided to compile a Yacc-like grammar into ATN form. A small grammar for the database application is restgram.clj You can turn on a trace as the database queries are added by entering (def tracedb true) .
To use the code, start Clojure and enter:
(load-file "cs378/restaurant.clj") (load-files) (gramcom grammar) (ask '(where can i get american food in palo-alto))
Numerical questions about problems that are governed by algebraic equations can be answered by a system based on sets of equations and a symbolic algebra solver. Examples of questions include:
(phys '(what is the area of a circle with radius = 2)) (phys '(what is the circumference of a circle with area = 12)) (phys '(what is the power of a lift with mass = 5 and height = 10 and time = 4))
An example of a program of this type is at: http://www.cs.utexas.edu/users/novak/cgi/physdemod.cgi . (You are not expected to handle this many question types.)
For this option, you will need your files for the first three assignments.
To use the code, start Clojure and enter:
(load-file "cs378/physics.clj") (load-files) (gramcom grammar) (phys '(what is the area of a circle with radius = 2))
Some additional physics equations can be found here:
Note that there is an easy way to add units to the physics option for this assignment.
Another interesting category of questions is:
(phys '(how does the force of gravitation vary with radius)) (phys '(how does the area of a circle vary if radius is doubled))There is an easy trick for answering such questions:
You might consider adding some logical or comparison expressions to the database language. Make sure your responses to questions give the information that a user would want, e.g. street address of a restaurant.
Add new physical principles, units, and different kinds of questions for the algebraic solver.
Hendrix, G. G., Sacerdoti, E. D., Sagalowicz, D., and Slocum, J., "Developing a Natural Language Interface to Complex Data", ACM Trans. on Database Systems, Vol. 3, pp. 105-147, 1978.