Relational Learning Techniques for Natural Language Information Extraction (1998)
The recent growth of online information available in the form of natural language documents creates a greater need for computing systems with the ability to process those documents to simplify access to the information. One type of processing appropriate for many tasks is information extraction, a type of text skimming that retrieves specific types of information from text. Although information extraction systems have existed for two decades, these systems have generally been built by hand and contain domain specific information, making them difficult to port to other domains. A few researchers have begun to apply machine learning to information extraction tasks, but most of this work has involved applying learning to pieces of a much larger system. This dissertation presents a novel rule representation specific to natural language and a relational learning system, Rapier, which learns information extraction rules. Rapier takes pairs of documents and filled templates indicating the information to be extracted and learns pattern-matching rules to extract fillers for the slots in the template. The system is tested on several domains, showing its ability to learn rules for different tasks. Rapier's performance is compared to a propositional learning system for information extraction, demonstrating the superiority of relational learning for some information extraction tasks. Because one difficulty in using machine learning to develop natural language processing systems is the necessity of providing annotated examples to supervised learning systems, this dissertation also describes an attempt to reduce the number of examples Rapier requires by employing a form of active learning. Experimental results show that the number of examples required to achieve a given level of performance can be significantly reduced by this method.
PhD Thesis, Department of Computer Sciences, University of Texas. 142 pages. Also appears as Artificial Intelligence Laboratory Technical Report AI 98-276.

Mary Elaine Califf Ph.D. Alumni mecaliff [at] ilstu edu