RAPIER

RAPIER is a bottom-up inductive learning system for learning information extract rules. It has been tested on several domains and performs comparably to or slightly better than other recent learning system for this task.

The C++ code for RAPIER is available via anonymous ftp. See the README file here for details. Pointers to papers on RAPIER can be found on our Natural Language Learning research page. Below is the standard reference (click on the open book image).

Relational Learning of Pattern-Match Rules for Information Extraction
Mary Elaine Califf and Raymond J. Mooney
Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), Orlando, FL, pp. 328-334, July, 1999.

Information extraction is a form of shallow text processing that locates a specified set of relevant items in a natural-language document. Systems for this task require significant domain-specific knowledge and are time-consuming and difficult to build by hand, making them a good application for machine learning. This paper presents a system, Rapier, that takes pairs of sample documents and filled templates and induces pattern-match rules that directly extract fillers for the slots in the template. Rapier employs a bottom-up learning algorithm which incorporates techniques from several inductive logic programming systems and acquires unbounded patterns that include constraints on the words, part-of-speech tags, and semantic classes present in the filler and the surrounding text. We present encouraging experimental results on two domains.

estlin@cs.utexas.edu