Statistical Relational Learning for Natural Language Information Extraction (2007)
Understanding natural language presents many challenging problems that lend themselves to statistical relational learning (SRL). Historically, both logical and probabilistic methods have found wide application in natural language processing (NLP). NLP inevitably involves reasoning about an arbitrary number of entities (people, places, and things) that have an unbounded set of complex relationships between them. Representing and reasoning about unbounded sets of entities and relations has generally been considered a strength of predicate logic. However, NLP also requires integrating uncertain evidence from a variety of sources in order to resolve numerous syntactic and semantic ambiguities. Effectively integrating multiple sources of uncertain evidence has generally been considered a strength of Bayesian probabilistic methods and graphical models. Consequently, NLP problems are particularly suited for SRL methods that combine the strengths of first-order predicate logic and probabilistic graphical models. In this article, we review our recent work on using Relational Markov Networks (RMNs) for information extraction, the problem of identifying phrases in natural language text that refer to specific types of entities. We use the expressive power of RMNs to represent and reason about several specific relationships between candidate entities and thereby collectively identify the appropriate set of phrases to extract. We present experiments on learning to extract protein names from biomedical text, which demonstrate the advantage of this approach over existing IE methods.
In Introduction to Statistical Relational Learning, L. Getoor and B. Taskar (Eds.), pp. 535-552, Cambridge, MA 2007. MIT Press.

Razvan Bunescu Ph.D. Alumni bunescu [at] ohio edu
Raymond J. Mooney Faculty mooney [at] cs utexas edu