CS 381K: Natural Language Interfaces

Due: November 21, 2007.

Computer interfaces based on natural languages such as English can be used by non-programmers with little or no training. Such an interface can easily be constructed provided that the English coverage is limited to a narrow domain; this requirement is satisfied by database systems and by many application areas.

For this exercise, you are to construct a natural language interface using the semantic grammar technique. A semantic grammar differs from a general English grammar in that phrases in the grammar have meaning in the domain rather than being purely syntactic descriptions of English phrases. For example, a general English grammar would have categories such as "noun phrase", while a semantic grammar would have categories such as "date" or "part-name".

Several files of Lisp programs are provided to be used with this assignment; these files are at: ftp://ftp.cs.utexas.edu/pub/novak/cs381k/

One thing that needs to be handled in the parser is the scanner pointer that points to the current position in the sentence. The position must be moved as the sentence is read, but it must be saved at strategic points to allow for backup in case of a parsing error. A set of simple functions to do this has been provided in the file atn.lsp ; these functions also recognize multi-word lexical terms.

Parsing programs using the ATN functions can be written by hand, as in the file nldb.lsp, or using the grammar compiler gramcom.lsp and a grammar as shown in the file gram.lsp.

There are two options for the application used for this assignment:

Choose one of these applications for your assignment. A starter file is provided for each; expand the language coverage of the grammar and the kinds of questions that can be answered. See the file README.nl

Relational Database

The database to be used is provided in the file restdb.lsp; this is a version of the same database used in the restaurant query system at: http://www.cs.utexas.edu/users/ml/rest.html . A good first step in developing your program is to write a grammar that will describe the legal sentences that may be used in querying this database. The grammar might be in the form of productions or an ATN. Note that the grammar only needs to represent sensible questions about this one particular database. Your grammar should meet two goals: avoidance of ambiguity by accepting only the "correct" parsing of any phrase, and generality in the sense that a variety of database queries can be handled by the grammar.

Examples of restaurant queries include:

%lisp
(load "/projects/cs381k/restaurant.lsp")
(askr '(show me a good chinese restaurant in los altos))
(askr '(where can i get ice cream in berkeley))
The file restqueries.lsp gives some sample queries; you do not need to implement all of them, but you can use these queries for ideas for extending the grammar. Some natural language systems produce a parse tree as output; in this case, the desired output is a database query. The output from parsing each phrase will in general be a part of a query; by collecting the parts of the query, the parsing algorithm can output a completed query that can be executed by the database system. Use the supplied database system and database in the file restdb.lsp to answer the questions; the file restlex.lsp provides lexicon definitions. Note that phrases in the English question will in general do one of two things to the database query:
  1. Restrict the set of records applicable to the query (e.g., "in los altos").
  2. Request information to be reported from the chosen records (e.g., "where ...").

A grammar compiler, gramcom.lsp, is provided to compile a Yacc-like grammar into Lisp. A small grammar for the database application is restgram.lsp

You can turn on traces of the grammar's actions by (setq *tracenl* t) . You can examine grammar functions using (symbol-function 's) and using symbol-function to examine the functions compiled for other grammar rules.

Algebraic Solver

Numerical questions about problems that are governed by algebraic equations can be answered by a system based on sets of equations and a symbolic algebra solver. Examples of questions include:

%lisp
(load "/projects/cs381k/physics.lsp")
(phys '(what is the area of a circle with radius = 2))

(phys '(what is the circumference of a circle with area = 12))

(phys '(what is the power of a lift with mass = 5 and height = 10
        and time = 4))

An example of a program of this type is at: http://www.cs.utexas.edu/users/novak/cgi/physdemo.cgi . (You are not expected to handle units, nor to handle this many question types.)

The following files are needed for this option: atn.lsp, gramcom.lsp, physgram.lsp, solveq.lsp, patmatch.lsp

Grading Criteria:

Expand the coverage of your program to allow more complex queries than the examples given. You might consider adding logical or comparison expressions to the database language. Add new physical principles and different kinds of questions for the algebraic solver.

References:

Woods, W. A., "Transition Network Grammars for Natural Language Analysis", Communications of the ACM, Oct. 1970, pp. 591-606.

Hendrix, G. G., Sacerdoti, E. D., Sagalowicz, D., and Slocum, J., "Developing a Natural Language Interface to Complex Data", ACM Trans. on Database Systems, Vol. 3, pp. 105-147, 1978.