Welcome to Geoquery!

A Learned Natural Language Interface to a US Geography Database

The Geoquery Demo is no longer available, but the resources below can still be downloaded.

Geoquery Database

Geoquery contains a small database of information about United States geography. It has about 800 facts, represented as Prolog assertions. The database mainly contains the following information:

The database can be downloaded from here.

Geoquery Training Corpus

The training corpus for Geoquery system contains 880 examples of queries in natural language paired with the corresponding queries in the formal query language. The learning systems for semantic parsing are given these examples from which they have to induce semantic parsers which can then map novel natural language queries into their formal forms. The query language used by the system CHILL is Prolog while the rest of the systems use an equivalent variable-free functional query language.

The training corpus can also be downloaded from here.

Learning Systems for Geoquery

Following are the systems which learn semantic parsers for Geoquery:

CHILL is an Inductive Logic Programming (ILP) framework for learning semantic parsers. It starts with a very simple, overly-general deterministic shift-reduce parser and uses ILP to refine the parser by inductively building rules to control the parser's actions. For details please refer to this paper.

KRISP maps natural language sentences to their formal representations using string-kernel-based classifiers. Formal representations for novel natural language sentences are obtained by finding the most probable semantic parse using these string classifiers. For details please refer to this paper.

SCISSOR uses an integrated statistical parser to produce a semantically augmented parse tree, in which each non-terminal node has both a syntactic and a semantic label. A compositional-semantics procedure is then used to map the augmented parse tree into a formal representation. For details please refer to this paper.

WASP uses state-of-the-art statistical machine translation techniques to map natural language sentences to their formal language representations. A word alignment model is used for lexical acquisition, and the parsing model itself can be seen as a syntax-based translation model. For details please refer to this paper.

Go to the Machine Learning Group homepage

For questions or comments email: rjkate@cs.utexas.edu