UT ML Group: Natural Language Learning

Natural language processing systems are difficult to build, and machine learning methods can help automate their construction significantly. Our research in learning for natural language mainly involves applying inductive logic programming and other relational learning techniques to constructing database interfaces and information extraction systems from supervised examples. However, we have also conducted research in learning for syntactic parsing, machine translation, word-sense disambiguation, and morphology (past tense generation).

Demos of learning natural-language interfaces:


Publications

Also see

To view a paper, click on the pdf image.

  1. Learning a Compositional Semantic Parser using an Existing Syntactic Parser
    Ruifang Ge and Raymond J. Mooney
    In the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2009), pp. 611--619, Suntec, Singapore, August 2009.
    Paper ID: 229

    We present a new approach to learning a semantic parser (a system that maps natural language sentences into logical form). Unlike previous methods, it exploits an existing syntactic parser to produce disambiguated parse trees that drive the compositional semantic interpretation. The resulting system produces improved results on standard corpora on natural language interfaces for database querying and simulated robot control.

  2. A Dependency-based Word Subsequence Kernel
    Rohit J. Kate
    In Proceedings of the conference on Empirical Methods in Natural Language Processing (EMNLP-2008) , pp. 400-409, Waikiki, Honolulu, Hawaii, October 2008.
    Paper ID: 223

    This paper introduces a new kernel which computes similarity between two natural language sentences as the number of paths shared by their dependency trees. The paper gives a very efficient algorithm to compute it. This kernel is also an improvement over the word subsequence kernel because it only counts linguistically meaningful word subsequences which are based on word dependencies. It overcomes some of the difficulties encountered by syntactic tree kernels as well. Experimental results demonstrate the advantage of this kernel over word subsequence and syntactic tree kernels.

  3. Transforming Meaning Representation Grammars to Improve Semantic Parsing
    Rohit J. Kate
    In Proceedings of the Twelfth Conference on Computational Natural Language Learning (CoNLL-2008) , pp. 33-40, Manchester, UK, August 2008.
    Paper ID: 222

    A semantic parser learning system learns to map natural language sentences into their domain-specific formal meaning representations, but if the constructs of the meaning representation language do not correspond well with the natural language then the system may not learn a good semantic parser. This paper presents approaches for automatically transforming a meaning representation grammar (MRG) to conform it better with the natural language semantics. It introduces grammar transformation operators and meaning representation macros which are applied in an error-driven manner to transform an MRG while training a semantic parser learning system. Experimental results show that the automatically transformed MRGs lead to better learned semantic parsers which perform comparable to the semantic parsers learned using manually engineered MRGs.

  4. Learning to Sportscast: A Test of Grounded Language Acquisition
    David L. Chen and Raymond J. Mooney
    In Proceedings of the 25th International Conference on Machine Learning (ICML) , Helsinki, Finland, July 2008.
    Paper ID: 219

    We present a novel commentator system that learns language from sportscasts of simulated soccer games. The system learns to parse and generate commentaries without any engineered knowledge about the English language. Training is done using only ambiguous supervision in the form of textual human commentaries and simulation states of the soccer games. The system simultaneously tries to establish correspondences between the commentaries and the simulation states as well as build a translation model. We also present a novel algorithm, Iterative Generation Strategy Learning (IGSL), for deciding which events to comment on. Human evaluations of the generated commentaries indicate they are of reasonable quality compared to human commentaries.

  5. Learning to Connect Language and Perception
    Raymond J. Mooney
    In Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI), Senior Member Paper, Chicago, IL, pp. 1598-1601, July 2008.
    Paper ID: 216

    To truly understand language, an intelligent system must be able to connect words, phrases, and sentences to its perception of objects and events in the world. Current natural language processing and computer vision systems make extensive use of machine learning to acquire the probabilistic knowledge needed to comprehend linguistic and visual input. However, to date, there has been relatively little work on learning the relationships between the two modalities. In this talk, I will review some of the existing work on learning to connect language and perception, discuss important directions for future research in this area, and argue that the time is now ripe to make a concerted effort to address this important, integrative AI problem.

  6. Learning for Semantic Parsing with Kernels under Various Forms of Supervision
    Rohit J. Kate
    Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, August 2007.
    159 pages.
    Paper ID: 215

    Semantic parsing involves deep semantic analysis that maps natural language sentences to their formal executable meaning representations. This is a challenging problem and is critical for developing computing systems that understand natural language input. This thesis presents a new machine learning approach for semantic parsing based on string-kernel-based classification. It takes natural language sentences paired with their formal meaning representations as training data. For every production in the formal language grammar, a Support-Vector Machine (SVM) classifier is trained using string similarity as the kernel. Meaning representations for novel natural language sentences are obtained by finding the most probable semantic parse using these classifiers. This method does not use any hard-matching rules and unlike previous and other recent methods, does not use grammar rules for natural language, probabilistic or otherwise, which makes it more robust to noisy input.

    Besides being robust, this approach is also flexible and able to learn under a wide range of supervision, from extra to weaker forms of supervision. It can easily utilize extra supervision given in the form of syntactic parse trees for natural language sentences by using a syntactic tree kernel instead of a string kernel. Its learning algorithm can also take advantage of detailed supervision provided in the form of semantically augmented parse trees. A simple extension using transductive SVMs enables the system to do semi-supervised learning and improve its performance utilizing unannotated sentences which are usually easily available. Another extension involving EM-like retraining makes the system capable of learning under ambiguous supervision in which the correct meaning representation for each sentence is not explicitly given, but instead a set of possible meaning representations is given. This weaker and more general form of supervision is better representative of a natural training environment for a language-learning system requiring minimal human supervision.

    For a semantic parser to work well, conformity between natural language and meaning representation grammar is necessary. However meaning representation grammars are typically designed to best suit the application which will use the meaning representations with little consideration for how well they correspond to natural language semantics. We present approaches to automatically transform meaning representation grammars to make them more compatible with natural language semantics and hence more suitable for learning semantic parsers. Finally, we also show that ensembles of different semantic parser learning systems can obtain the best overall performance.

  7. Learning for Semantic Parsing and Natural Language Generation Using Statistical Machine Translation Techniques
    Yuk Wah Wong
    Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, August 2007.
    188 pages.
    Also appears as Technical Report AI07-343, Artificial Intelligence Lab, University of Texas at Austin, August 2007.
    Paper ID: 214

    One of the main goals of natural language processing (NLP) is to build automated systems that can understand and generate human languages. This goal has so far remained elusive. Existing hand-crafted systems can provide in-depth analysis of domain sub-languages, but are often notoriously fragile and costly to build. Existing machine-learned systems are considerably more robust, but are limited to relatively shallow NLP tasks.

    In this thesis, we present novel statistical methods for robust natural language understanding and generation. We focus on two important sub-tasks, semantic parsing and tactical generation. The key idea is that both tasks can be treated as the translation between natural languages and formal meaning representation languages, and therefore, can be performed using state-of-the-art statistical machine translation techniques. Specifically, we use a technique called synchronous parsing, which has been extensively used in syntax-based machine translation, as the unifying framework for semantic parsing and tactical generation. The parsing and generation algorithms learn all of their linguistic knowledge from annotated corpora, and can handle natural-language sentences that are conceptually complex.

    A nice feature of our algorithms is that the semantic parsers and tactical generators share the same learned synchronous grammars. Moreover, charts are used as the unifying language-processing architecture for efficient parsing and generation. Therefore, the generators are said to be the inverse of the parsers, an elegant property that has been widely advocated. Furthermore, we show that our parsers and generators can handle formal meaning representation languages containing logical variables, including predicate logic.

    Our basic semantic parsing algorithm is called WASP. Most of the other parsing and generation algorithms presented in this thesis are extensions of WASP or its inverse. We demonstrate the effectiveness of our parsing and generation algorithms by performing experiments in two real-world, restricted domains. Experimental results show that our algorithms are more robust and accurate than the currently best systems that require similar supervision. Our work is also the first attempt to use the same automatically-learned grammar for both parsing and generation. Unlike previous systems that require manually-constructed grammars and lexicons, our systems require much less knowledge engineering and can be easily ported to other languages and domains.

  8. Learning for Information Extraction: From Named Entity Recognition and Disambiguation To Relation Extraction
    Razvan Constantin Bunescu
    Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, August 2007.
    150 pages.
    Also appears as Technical Report AI07-345, Artificial Intelligence Lab, University of Texas at Austin, August 2007.
    Paper ID: 213

    Information Extraction, the task of locating textual mentions of specific types of entities and their relationships, aims at representing the information contained in text documents in a structured format that is more amenable to applications in data mining, question answering, or the semantic web. The goal of our research is to design information extraction models that obtain improved performance by exploiting types of evidence that have not been explored in previous approaches. Since designing an extraction system through introspection by a domain expert is a laborious and time consuming process, the focus of this thesis will be on methods that automatically induce an extraction model by training on a dataset of manually labeled examples.

    Named Entity Recognition is an information extraction task that is concerned with finding textual mentions of entities that belong to a predefined set of categories. We approach this task as a phrase classification problem, in which candidate phrases from the same document are collectively classified. Global correlations between candidate entities are captured in a model built using the expressive framework of Relational Markov Networks. Additionally, we propose a novel tractable approach to phrase classification for named entity recognition based on a special Junction Tree representation.

    Classifying entity mentions into a predefined set of categories achieves only a partial disambiguation of the names. This is further refined in the task of Named Entity Disambiguation, where names need to be linked to their actual denotations. In our research, we use Wikipedia as a repository of named entities and propose a ranking approach to disambiguation that exploits learned correlations between words from the name context and categories from the Wikipedia taxonomy.

    Relation Extraction refers to finding relevant relationships between entities mentioned in text documents. Our approaches to this information extraction task differ in the type and the amount of supervision required. We first propose two relation extraction methods that are trained on documents in which sentences are manually annotated for the required relationships. In the first method, the extraction patterns correspond to sequences of words and word classes anchored at two entity names occurring in the same sentence. These are used as implicit features in a generalized subsequence kernel, with weights computed through training of Support Vector Machines. In the second approach, the implicit extraction features are focused on the shortest path between the two entities in the word-word dependency graph of the sentence. Finally, in a significant departure from previous learning approaches to relation extraction, we propose reducing the amount of required supervision to only a handful of pairs of entities known to exhibit or not exhibit the desired relationship. Each pair is associated with a bag of sentences extracted automatically from a very large corpus. We extend the subsequence kernel to handle this weaker form of supervision, and describe a method for weighting features in order to focus on those correlated with the target relation rather than with the individual entities. The resulting Multiple Instance Learning approach offers a competitive alternative to previous relation extraction methods, at a significantly reduced cost in human supervision.

  9. Learning Language Semantics from Ambiguous Supervision
    Rohit J. Kate and Raymond J. Mooney
    In Proceedings of the 22nd AAAI Conference on Artificial Intelligence (AAAI-2007), Vancouver, BC, pp. 895-900, July 2007.
    Paper ID: 200

    This paper presents a method for learning a semantic parser from ambiguous supervision. Training data consists of natural language sentences annotated with multiple potential meaning representations, only one of which is correct. Such ambiguous supervision models the type of supervision that can be more naturally available to language-learning systems. Given such weak supervision, our approach produces a semantic parser that maps sentences into meaning representations. An existing semantic parsing learning system that can only learn from unambiguous supervision is augmented to handle ambiguous supervision. Experimental results show that the resulting system is able to cope up with ambiguities and learn accurate semantic parsers.

  10. Learning Synchronous Grammars for Semantic Parsing with Lambda Calculus
    Yuk Wah Wong and Raymond J. Mooney
    Best Paper Award
    In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL-2007), pp. 960-967, Prague, Czech Republic, June 2007.
    Paper ID: 199

    This paper presents the first empirical results to our knowledge on learning synchronous grammars that generate logical forms. Using statistical machine translation techniques, a semantic parser based on a synchronous context-free grammar augmented with lambda-operators is learned given a set of training sentences and their correct logical forms. The resulting parser is shown to be the best-performing system so far in a database query domain.

  11. Learning to Extract Relations from the Web using Minimal Supervision
    Razvan C. Bunescu and Raymond J. Mooney
    In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL) , Prague, Czech Republic, pp. 576--583, June 2007.
    Paper ID: 204

    We present a new approach to relation extraction that requires only a handful of training examples. Given a few pairs of named entities known to exhibit or not exhibit a particular relation, bags of sentences containing the pairs are extracted from the web. We extend an existing relation extraction method to handle this weaker form of supervision, and present experimental results demonstrating that our approach can reliably extract relations from web documents.

  12. Semi-Supervised Learning for Semantic Parsing using Support Vector Machines
    Rohit J. Kate and Raymond J. Mooney
    In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Short Papers (NAACL/HLT-2007), pp. 81-84, Rochester, NY, April 2007.
    Paper ID: 198

    We present a method for utilizing unannotated sentences to improve a semantic parser which maps natural language (NL) sentences into their formal meaning representations (MRs). Given NL sentences annotated with their MRs, the initial supervised semantic parser learns the mapping by training Support Vector Machine (SVM) classifiers for every production in the MR grammar. Our new method applies the learned semantic parser to the unannotated sentences and collects unlabeled examples which are then used to retrain the classifiers using a variant of transductive SVMs. Experimental results show the improvements obtained over the purely supervised parser, particularly when the annotated training set is small.

  13. Generation by Inverting a Semantic Parser That Uses Statistical Machine Translation
    Yuk Wah Wong and Raymond J. Mooney
    In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (NAACL/HLT-2007), pp. 172-179, Rochester, NY, April 2007.
    Paper ID: 197

    This paper explores the use of statistical machine translation (SMT) methods for tactical natural language generation. We present results on using phrase-based SMT for learning to map meaning representations to natural language. Improved results are obtained by inverting a semantic parser that uses SMT methods to map sentences into meaning representations. Finally, we show that hybridizing these two approaches results in still more accurate generation systems. Automatic and human evaluation of generated sentences are presented across two domains and four languages.

  14. Extracting Relations from Text: From Word Sequences to Dependency Paths
    Razvan C. Bunescu and Raymond J. Mooney
    Text Mining and Natural Language Processing, Anne Kao and Steve Poteet (eds.), pp. 29-44, Springer, 2007.
    Paper ID: 186

  15. Learning for Semantic Parsing
    Raymond J. Mooney
    Computational Linguistics and Intelligent Text Processing: Proceedings of the 8th International Conference, CICLing 2007, Mexico City (invited paper), A. Gelbukh (Ed.), pp. 311-324, Springer, Berlin, Germany, February 2007.
    Paper ID: 196

    Semantic parsing is the task of mapping a natural language sentence into a complete, formal meaning representation. Over the past decade, we have developed a number of machine learning methods for inducing semantic parsers by training on a corpus of sentences paired with their meaning representations in a specified formal language. We have demonstrated these methods on the automated construction of natural-language interfaces to databases and robot command languages. This paper reviews our prior work on this topic and discusses directions for future research.

  16. Learning Language from Perceptual Context: A Challenge Problem for AI
    Raymond J. Mooney
    In Proceedings of the 2006 AAAI Fellows Symposium, Boston, MA, July 2006.
    Paper ID: 192

    We present the problem of learning to understand natural language from examples of utterances paired only with their relevant real-world context as an important challenge problem for AI. Machine learning has been adopted as the most effective way of developing natural-language processing systems; however, currently, complex annotated corpora are required for training. By learning language from perceptual context, the need for laborious annotation is removed and the system's resulting understanding is grounded in its perceptual experience.

  17. Using String-Kernels for Learning Semantic Parsers
    Rohit J. Kate and Raymond J. Mooney
    In Proceedings of the Joint 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL-2006), pp. 913-920, Sydney, Australia, July 2006.
    Paper ID: 191

    We present a new approach for mapping natural language sentences to their formal meaning representations using string-kernel-based classifiers. Our system learns these classifiers for every production in the formal language grammar. Meaning representations for novel natural language sentences are obtained by finding the most probable semantic parse using these string classifiers. Our experiments on two real-world data sets show that this approach compares favorably to other existing systems and is particularly robust to noise.

  18. Discriminative Reranking for Semantic Parsing
    Ruifang Ge and Raymond J. Mooney
    In Proceedings of the COLING/ACL-2006 Main Conference Poster Sessions, pp. 263-270, Sydney, Australia, July 2006.
    Paper ID: 190

    Semantic parsing is the task of mapping natural language sentences to complete formal meaning representations. The performance of semantic parsing can be potentially improved by using discriminative reranking, which explores arbitrary global features. In this paper, we investigate discriminative reranking upon a baseline semantic parser, SCISSOR, where the composition of meaning representations is guided by syntax. We examine if features used for syntactic parsing can be adapted for semantic parsing by creating similar semantic features based on the mapping between syntax and semantics. We report experimental results on two real applications, an interpreter for coaching instructions in robotic soccer and a natural-language database interface. The results show that reranking can improve the performance on the coaching interpreter, but not on the database interface.

  19. Integrating Co-occurrence Statistics with Information Extraction for Robust Retrieval of Protein Interactions from Medline
    Razvan Bunescu, Raymond Mooney, Arun Ramani and Edward Marcotte
    In Proceedings of the HLT-NAACL Workshop on Linking Natural Language Processing and Biology: Towards deeper biological literature analysis (BioNLP-2006), pp. 49-56, New York City, NY, June 2006.
    Paper ID: 188

    The task of mining relations from collections of documents is usually approached in two different ways. One type of systems do relation extraction from individual sentences, followed by an aggregation of the results over the entire collection. Other systems follow an entirely different approach, in which co-occurrence counts are used to determine whether the mentioning together of two entities is due to more than simple chance. We show that increased extraction performance can be obtained by combining the two approaches into an integrated relation extraction model.

  20. Learning for Semantic Parsing with Statistical Machine Translation
    Yuk Wah Wong and Raymond J. Mooney
    In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT/NAACL-2006), pp. 439-446, New York City, NY, June 2006.
    Paper ID: 187

    We present a novel statistical approach to semantic parsing, WASP, for constructing a complete, formal meaning representation of a sentence. A semantic parser is learned given a set of sentences annotated with their correct meaning representations. The main innovation of WASP is its use of state-of-the-art statistical machine translation techniques. A word alignment model is used for lexical acquisition, and the parsing model itself can be seen as a syntax-based translation model. We show that WASP performs favorably in terms of both accuracy and coverage compared to existing learning methods requiring similar amount of supervision, and shows better robustness to variations in task complexity and word order.

  21. Using Encyclopedic Knowledge for Named Entity Disambiguation
    Razvan Bunescu and Marius Pasca
    In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-06), pp. 9-16, Trento, Italy, April 2006.
    Paper ID: 185

    We present a new method for detecting and disambiguating named entities in open domain text. A disambiguation SVM kernel is trained to exploit the high coverage and rich structure of the knowledge encoded in an online encyclopedia. The resulting model significantly outperforms a less informed baseline.

  22. Learning Semantic Parsers Using Statistical Syntactic Parsing Techniques
    Ruifang Ge
    Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, February 2006.
    41 pages.
    Also appears as Technical Report UT-AI-TR-06-327, Artificial Intelligence Lab, University of Texas at Austin, February 2006.
    Paper ID: 184

    Most recent work on semantic analysis of natural language has focused on "shallow" semantics such as word-sense disambiguation and semantic role labeling. Our work addresses a more ambitious task we call semantic parsing where natural language sentences are mapped to complete formal meaning representations. We present our system Scissor based on a statistical parser that generates a semantically-augmented parse tree (SAPT), in which each internal node has both a syntactic and semantic label. A compositional-semantics procedure is then used to map the augmented parse tree into a final meaning representation. Training the system requires sentences annotated with augmented parse trees. We evaluate the system in two domains, a natural-language database interface and an interpreter for coaching instructions in robotic soccer. We present experimental results demonstrating that Scissor produces more accurate semantic representations than several previous approaches on long sentences.
    In the future, we intend to pursue several directions in developing more accurate semantic parsing algorithms and automating the annotation process. This work will involve exploring alternative tree representations for better generalization in parsing. We also plan to apply discriminative reranking methods to semantic parsing, which allows exploring arbitrary, potentially correlated features not usable by the baseline learner. We also propose to design a method for automating the SAPT-generation process to alleviate the extra annotation work currently required for training Scissor. Finally, we will investigate the impact of different statistical syntactic parsers on semantic parsing using the automated SAPT-generation process.

  23. A Kernel-based Approach to Learning Semantic Parsers
    Rohit J. Kate
    Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, November 2005.
    34 pages.
    Also appears as Technical Report UT-AI-05-326, Artificial Intelligence Lab, University of Texas at Austin, November 2005.
    Paper ID: 181

    Semantic parsing involves deep semantic analysis that maps natural language sentences to their formal executable meaning representations. This is a challenging problem and is critical for developing user-friendly natural language interfaces to computing systems. Most of the research in natural language understanding, however, has mainly focused on shallow semantic analysis like case-role analysis or word sense disambiguation. The existing work in semantic parsing either lack the robustness of statistical methods or are applicable only to simple domains where semantic analysis is equivalent to filling a single semantic frame.

    In this proposal, we present a new approach to semantic parsing based on string-kernel-based classification. Our system takes natural language sentences paired with their formal meaning representations as training data. For every production in the formal language grammar, a Support-Vector Machine (SVM) classifier is trained using string similarity as the kernel. Each classifier then gives the probability of the production covering any given natural language string of words. These classifiers are further refined using EM-type iterations based on their performance on the training data. Meaning representations for novel natural language sentences are obtained by finding the most probable semantic parse using these classifiers. Our experiments on two real-world data sets that have deep meaning representations show that this approach compares favorably to other existing systems in terms of accuracy and coverage.

    For future work, we propose to extend this approach so that it will also exploit the knowledge of natural language syntax by using the existing syntactic parsers. We also intend to broaden the scope of application domains, for example, domains where the sentences are noisy as typical in speech, or domains where corpora available for training do not have natural language sentences aligned with their unique meaning representations. We aim to test our system on the task of complex relation extraction as well. Finally, we also plan to investigate ways to combine our semantic parser with some recently developed semantic parsers to form committees in order to get the best overall performance.

  24. Learning for Semantic Parsing Using Statistical Machine Translation Techniques
    Yuk Wah Wong
    Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, October 2005.
    53 pages.
    Also appears as Technical Report UT-AI-05-323, Artificial Intelligence Lab, University of Texas at Austin, October 2005.
    Paper ID: 180

    Semantic parsing is the construction of a complete, formal, symbolic meaning representation of a sentence. While it is crucial to natural language understanding, the problem of semantic parsing has received relatively little attention from the machine learning community. Recent work on natural language understanding has mainly focused on shallow semantic analysis, such as word- sense disambiguation and semantic role labeling. Semantic parsing, on the other hand, involves deep semantic analysis in which word senses, semantic roles and other components are combined to produce useful meaning representations for a particular application domain (e.g. database query). Prior research in machine learning for semantic parsing is mainly based on inductive logic programming or deterministic parsing, which lack some of the robustness that characterizes statistical learning. Existing statistical approaches to semantic parsing, however, are mostly concerned with relatively simple application domains in which a meaning representation is no more than a single semantic frame.

    In this proposal, we present a novel statistical approach to semantic parsing, WASP, which can handle meaning representations with a nested structure. The WASP algorithm learns a semantic parser given a set of sentences annotated with their correct meaning representations. The parsing model is based on the synchronous context-free grammar, where each rule maps a natural-language substring to its meaning representation. The main innovation of the algorithm is its use of state-of-the-art statistical machine translation techniques. A statistical word alignment model is used for lexical acquisition, and the parsing model itself can be seen as an instance of a syntax-based translation model. In initial evaluation on several real-world data sets, we show that WASP performs favorably in terms of both accuracy and coverage compared to existing learning methods requiring similar amount of supervision, and shows better robustness to variations in task complexity and word order.

    In future work, we intend to pursue several directions in developing accurate semantic parsers for a variety of application domains. This will involve exploiting prior knowledge about the natural-language syntax and the application domain. We also plan to construct a syntax-aware word-based alignment model for lexical acquisition. Finally, we will generalize the learning algorithm to handle context-dependent sentences and accept noisy training data.

  25. A Shortest Path Dependency Kernel for Relation Extraction
    Bunescu, R. C., and Mooney, R.J.
    Appears in Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, B.C., pp. 724--731, October 2005.
    Paper ID: 175

    We present a novel approach to relation extraction, based on the observation that the information required to assert a relationship between two named entities in the same sentence is typically captured by the shortest path between the two entities in the dependency graph. Experiments on extracting top-level relations from the ACE (Automated Content Extraction) newspaper corpus show that the new shortest path dependency kernel outperforms a recent approach based on dependency tree kernels.

  26. Consolidating the Set of Known Human Protein-Protein Interactions in Preparation for Large-Scale Mapping of the Human Interactome
    Ramani, A.K., Bunescu, R.C., Mooney, R.J. and Marcotte, E.M.
    Genome Biology, 6, 5, r40(2005).
    Paper ID: 172

    Background

    Extensive protein interaction maps are being constructed for yeast, worm, and fly to ask how the proteins organize into pathways and systems, but no such genome-wide interaction map yet exists for the set of human proteins. To prepare for studies in humans, we wished to establish tests for the accuracy of future interaction assays and to consolidate the known interactions among human proteins.

    Results

    We established two tests of the accuracy of human protein interaction datasets and measured the relative accuracy of the available data. We then developed and applied natural language processing and literature-mining algorithms to recover from Medline abstracts 6,580 interactions among 3,737 human proteins. A three-part algorithm was used: first, human protein names were identified in Medline abstracts using a discriminator based on conditional random fields, then interactions were identified by the co-occurrence of protein names across the set of Medline abstracts, filtering the interactions with a Bayesian classifier to enrich for legitimate physical interactions. These mined interactions were combined with existing interaction data to obtain a network of 31,609 interactions among 7,748 human proteins, accurate to the same degree as the existing datasets.

    Conclusion

    These interactions and the accuracy benchmarks will aid interpretation of current functional genomics data and provide a basis for determining the quality of future large-scale human protein interaction assays. Projecting from the approximately 15 interactions per protein in the best-sampled interaction set to the estimated 25,000 human genes implies more than 375,000 interactions in the complete human protein interaction network. This set therefore represents no more than 10% of the complete network.

  27. A Statistical Semantic Parser that Integrates Syntax and Semantics
    Ge, R. and Mooney, R.J.
    Proceedings of the Ninth Conference on Computational Natural Language Learning, Ann Arbor, MI, pp. 9--16, June 2005.
    Paper ID: 171

    We introduce a learning semantic parser, Scissor, that maps natural-language sentences to a detailed, formal, meaning-representation language. It first uses an integrated statistical parser to produce a semantically augmented parse tree, in which each non-terminal node has both a syntactic and a semantic label. A compositional-semantics procedure is then used to map the augmented parse tree into a final meaning representation. We evaluate the system in two domains, a natural-language database interface and an interpreter for coaching instructions in robotic soccer. We present experimental results demonstrating that Scissor produces more accurate semantic representations than several previous approaches.

  28. Mining Knowledge from Text Using Information Extraction
    Mooney, R. J. and Bunescu, R.
    SIGKDD Explorations (special issue on Text Mining and Natural Language Processing), 7, 1 (2005), pp. 3-10.
    Paper ID: 170

    An important approach to text mining involves the use of natural-language information extraction. Information extraction (IE) distills structured data or knowledge from unstructured text by identifying references to named entities as well as stated relationships between such entities. IE systems can be used to directly extricate abstract knowledge from a text corpus, or to extract concrete data from a set of documents which can then be further analyzed with traditional data-mining techniques to discover more general patterns. We discuss methods and implemented systems for both of these approaches and summarize results on mining real text corpora of biomedical abstracts, job announcements, and product descriptions. We also discuss challenges that arise when employing current information extraction technology to discover knowledge in text.

  29. Subsequence Kernels for Relation Extraction
    Razvan Bunescu and Raymond J. Mooney
    Advances in Neural Information Processing Systems, Vol. 18: Proceedings of the 2005 Conference (NIPS), Y. Weiss, B. Schoelkopf, J. Platt (Eds.), MIT Press, 2006.
    Paper ID: 169

    We present a new kernel method for extracting semantic relations between entities in natural language text, based on a generalization of subsequence kernels. This kernel uses three types of subsequence patterns that are typically employed in natural language to assert relationships between two entities. Experiments on extracting protein interactions from biomedical corpora and top-level relations from newspaper corpora demonstrate the advantages of this approach.

  30. Statistical Relational Learning for Natural Language Information Extraction
    Razvan Bunescu and Raymond J. Mooney
    Introduction to Statistical Relational Learning, Getoor, L. and Taskar, B. (Eds.), pp. 535-552, MIT Press, Cambridge, MA, 2007.
    Paper ID: 165

    Understanding natural language presents many challenging problems that lend themselves to statistical relational learning (SRL). Historically, both logical and probabilistic methods have found wide application in natural language processing (NLP). NLP inevitably involves reasoning about an arbitrary number of entities (people, places, and things) that have an unbounded set of complex relationships between them. Representing and reasoning about unbounded sets of entities and relations has generally been considered a strength of predicate logic. However, NLP also requires integrating uncertain evidence from a variety of sources in order to resolve numerous syntactic and semantic ambiguities. Effectively integrating multiple sources of uncertain evidence has generally been considered a strength of Bayesian probabilistic methods and graphical models. Consequently, NLP problems are particularly suited for SRL methods that combine the strengths of first-order predicate logic and probabilistic graphical models. In this article, we review our recent work on using Relational Markov Networks (RMNs) for information extraction, the problem of identifying phrases in natural language text that refer to specific types of entities. We use the expressive power of RMNs to represent and reason about several specific relationships between candidate entities and thereby collectively identify the appropriate set of phrases to extract. We present experiments on learning to extract protein names from biomedical text, which demonstrate the advantage of this approach over existing IE methods.

  31. Using Biomedical Literature Mining to Consolidate the Set of Known Human Protein-Protein Interactions
    Ramani, A., Marcotte E., Bunescu, R., and Mooney, R.J.
    Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, pp. 46--53, Detroit, MI, June 2005.
    Paper ID: 164

    This paper presents the results of a large-scale effort to construct a comprehensive database of known human protein interactions by combining and linking known interactions from existing databases and then adding to them by automatically mining additional interactions from 750,000 Medline abstracts. The end result is a network of 31,609 interactions amongst 7,748 proteins. The text mining system first identifies protein names in the text using a trained Conditional Random Field (CRF) and then identifies interactions through a filtered co-citation analysis. We also report two new strategies for mining interactions, either by finding explicit statements of interactions in the text using learned pattern-based rules or a Support-Vector Machine using a string kernel. Using information in existing ontologies, the automatically extracted data is shown to be of equivalent accuracy to manually curated data sets.

  32. Learning to Transform Natural to Formal Languages
    Kate, R.J., Wong, Y. W., and Mooney, R.J.
    Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI-05), Pittsburgh, PA, pp. 1062--1068, July 2005.
    Paper ID: 160

    This paper presents a method for inducing transformation rules that map natural-language sentences into a formal query or command language. The approach assumes a formal grammar for the target representation language and learns transformation rules that exploit the non-terminal symbols in this grammar. The learned transformation rules incrementally map a natural-language sentence or its syntactic parse tree into a parse-tree for the target formal language. Experimental results are presented for two corpora, one which maps English instructions into an existing formal coaching language for simulated RoboCup soccer agents, and another which maps English U.S.-geography questions into a database query language. We show that our method performs overall better and faster than previous approaches in both domains.

  33. Text Mining with Information Extraction
    Raymond J. Mooney and Un Yong Nahm
    Multilingualism and Electronic Language Management: Proceedings of the 4th International MIDP Colloquium, 22-23 September 2003, Bloemfontein, South Africa, Daelemans, W., du Plessis, T., Snyman, C. and Teck, L. (Eds.), pp. 141-160, Van Schaik Pub., South Africa, 2005.
    Paper ID: 159

    Text mining concerns looking for patterns in unstructured text. The related task of Information Extraction (IE) is about locating specific items in natural-language documents. This paper presents a framework for text mining, called DiscoTEX (Discovery from Text EXtraction), using a learned information extraction system to transform text into more structured data which is then mined for interesting relationships. The initial version of DiscoTEX integrates an IE module acquired by an IE learning system, and a standard rule induction module. In addition, rules mined from a database extracted from a corpus of texts are used to predict additional information to extract from future documents, thereby improving the recall of the underlying extraction system. Encouraging results are presented on applying these techniques to a corpus of computer job announcement postings from an Internet newsgroup.

  34. Collective Information Extraction with Relational Markov Networks
    Razvan Bunescu and Raymond J. Mooney
    Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-2004), pp. 439-446, Barcelona, Spain, July 2004.
    Paper ID: 152

    Most information extraction (IE) systems treat separate potential extractions as independent. However, in many cases, considering influences between different potential extractions could improve overall accuracy. Statistical methods based on undirected graphical models, such as conditional random fields (CRFs), have been shown to be an effective approach to learning accurate IE systems. We present a new IE method that employs Relational Markov Networks (a generalization of CRFs), which can represent arbitrary dependencies between extractions. This allows for "collective information extraction" that exploits the mutual influence between possible extractions. Experiments on learning to extract protein names from biomedical text demonstrate the advantages of this approach.

  35. Using Soft-Matching Mined Rules to Improve Information Extraction
    Un Yong Nahm and Raymond J. Mooney
    Proceedings of the AAAI-2004 Workshop on Adaptive Text Extraction and Mining (ATEM-2004), pp. 27-32, San Jose, CA, July 2004.
    Paper ID: 150

    By discovering predictive relationships between different pieces of extracted data, data-mining algorithms can be used to improve the accuracy of information extraction. However, textual variation due to typos, abbreviations, and other sources can prevent the productive discovery and utilization of hard-matching rules. Recent methods for inducing soft-matching rules from extracted data can more effectively find and exploit predictive relationships in textual data. This paper presents techniques for using mined soft-matching association rules to increase the accuracy of information extraction. Experimental results on a corpus of computer-science job postings demonstrate that soft-matching rules improve information extraction more effectively than hard-matching rules.

  36. Relational Markov Networks for Collective Information Extraction
    Razvan Bunescu and Raymond J. Mooney
    Proceedings of the ICML-2004 Workshop on Statistical Relational Learning and its Connections to Other Fields (SRL-2004), Banff, Canada, July 2004.
    Paper ID: 145

    Most information extraction (IE) systems treat separate potential extractions as independent. However, in many cases, considering influences between different potential extractions could improve overall accuracy. Statistical methods based on undirected graphical models, such as conditional random fields (CRFs), have been shown to be an effective approach to learning accurate IE systems. We present a new IE method that employs Relational Markov Networks, which can represent arbitrary dependencies between extractions. This allows for "collective information extraction" that exploits the mutual influence between possible extractions. Experiments on learning to extract protein names from biomedical text demonstrate the advantages of this approach.

  37. Learning Transformation Rules for Semantic Parsing
    Rohit J. Kate, Yuk Wah Wong, Ruifang Ge, and Raymond J. Mooney
    Unpublished Technical Note, April 2004.
    Paper ID: 140

    This paper presents an approach for inducing transformation rules that map natural-language sentences into a formal semantic representation language. The approach assumes a formal grammar for the target representation language and learns transformation rules that exploit the non-terminal symbols in this grammar. Patterns for the transformation rules are learned using an induction algorithm based on longest-common-subsequences previously developed for an information extraction system. Experimental results are presented on learning to map English coaching instructions for Robocup soccer into an existing formal language for coaching simulated robotic agents.

  38. Learning Semantic Parsers: An Important But Under-Studied Problem
    Raymond J. Mooney
    Papers from the AAAI 2004 Spring Symposium on Language Learning: An Interdisciplinary Perspective, pp. 39-44, Stanford, CA, March 2004.
    Paper ID: 138

    Computational systems that learn to transform natural-language sentences into semantic representations have important practical applications in building natural-language interfaces. They can also provide insight into important issues in human language acquisition. However, within AI, computational linguistics, and machine learning, there has been relatively little research on developing systems that learn such semantic parsers. This paper briefly reviews our own work in this area and presents semantic-parser acquistion as an important challenge problem for AI.

  39. Comparative Experiments on Learning Information Extractors for Proteins and their Interactions
    Razvan Bunescu, Ruifang Ge, Rohit J. Kate, Edward M. Marcotte, Raymond J. Mooney, Arun Kumar Ramani, and Yuk Wah Wong
    Artificial Intelligence in Medicine (Special Issue on Summarization and Information Extraction from Medical Documents), 33, 2 (2005), pp. 139-155.
    Paper ID: 137

    Automatically extracting information from biomedical text holds the promise of easily consolidating large amounts of biological knowledge in computer-accessible form. This strategy is particularly attractive for extracting data relevant to genes of the human genome from the 11 million abstracts in Medline. However, extraction efforts have been frustrated by the lack of conventions for describing human genes and proteins. We have developed and evaluated a variety of learned information extraction systems for identifying human protein names in Medline abstracts and subsequently extracting information on interactions between the proteins. We demonstrate that machine learning approaches using support vector machines and hidden Markov models are able to identify human proteins with higher accuracy than several previous approaches. We also demonstrate that various rule induction methods are able to identify protein interactions more accurately than manually-developed rules.

  40. Integrating Top-down and Bottom-up Approaches in Inductive Logic Programming: Applications in Natural Language Processing and Relational Data Mining
    Lappoon R. Tang
    Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, August 2003.
    219 pages
    Paper ID: 130

    Inductive Logic Programming (ILP) is the intersection of Machine Learning and Logic Programming in which the learner's hypothesis space is the set of logic programs. There are two major ILP approaches: top-down and bottom-up. The former searches the hypothesis space from general to specific while the latter the other way round. Integrating both approaches has been demonstrated to be more effective. Integrated ILP systems were previously developed for two tasks: learning semantic parsers (Chillin), and mining relational data (Progol). Two new integrated ILP systems for these tasks that overcome limitations of existing methods will be presented.
    Cocktail is a new ILP algorithm for inducing semantic parsers. For this task, two features of a parse state, functional structure and context, provide important information for disambiguation. A bottom-up approach is more suitable for learning the former, while top-down is better for the latter. By allowing both approaches to induce program clauses and choosing the best combination of their results, Cocktail learns more effective parsers. Experimental results on learning natural-language interfaces for two databases demonstrate that it learns more accurate parsers than Chillin, the previous best method for this task.
    Beth is a new integrated ILP algorithm for relational data mining. The Inverse Entailment approach to ILP, implemented in the Progol and Aleph systems, starts with the construction of a bottom clause, the most specific hypothesis covering a seed example. When mining relational data with a large number of background facts, the bottom clause becomes intractably large, making learning very inefficient. A top-down approach heuristically guides the construction of clauses without building a bottom clause; however, it wastes time exploring clauses that cover no positive examples. By using a top-down approach to heuristically guide the construction of generalizations of a bottom clause, Beth combines the strength of both approaches. Learning patterns for detecting potential terrorist activity is a current challenge problem for relational data mining. Experimental results on artificial data for this task with over half a million facts show that Beth is significantly more efficient at discovering such patterns than Aleph and m-Foil, two leading ILP systems.

  41. Learning to Extract Proteins and their Interactions from Medline Abstracts
    Razvan Bunescu, Ruifang Ge, Rohit J. Kate, Raymond J. Mooney, Yuk Wah Wong, Edward M. Marcotte, and Arun Kumar Ramani
    Proceedings of the ICML-2003 Workshop on Machine Learning in Bioinformatics, pp.46-53, Washington DC, August 2003.
    Paper ID: 126

    We present results from a variety of learned information extraction systems for identifying human protein names in Medline abstracts and subsequently extracting interactions between the proteins. We demonstrate that machine learning approaches using support vector machines and hidden Markov models are able to identify human proteins with higher accuracy than several previous approaches. We also demonstrate that various rule induction methods are able to identify protein interactions with higher precision than manually-developed rules.

  42. Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction
    Mary Elaine Califf and Raymond J. Mooney
    Journal of Machine Learning Research, 4, (2003), pp. 177-210.
    Paper ID: 124

    Information Extraction is a form of shallow text processing that locates a specified set of relevant items in a natural-language document. Systems for this task require significant domain-specific knowledge and are time-consuming and difficult to build by hand, making them a good application for machine learning. We present a aystem, RAPIER, that uses pairs of sample documents and filled templates to induce pattern-match rules that directly extract fillers for the slots in the template. RAPIER employs a bottom-up learning algorithm which incorporates techniques from several inductive logic programming systems and acquires unbounded patterns that include constraints on the words, part-of-speech tags, and semantic classes present in the filler and the surrounding text. We present encouraging experimental results on two domains.

  43. Acquiring Word-Meaning Mappings for Natural Language Interfaces
    Cynthia A. Thompson and Raymond J. Mooney
    Journal of Artificial Intelligence Research, 18, (2003) pp. 1-44.
    Paper ID: 121

    This paper focuses on a system, Wolfie (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with semantic representations. The lexicon learned consists of phrases paired with meaning representations. Wolfie is part of an integrated system that learns to parse representations such as logical database queries.
    Experimental results are presented demonstrating Wolfie's ability to learn useful lexicons for a database interface in four different natural languages. The usefulness of the lexicons learned by Wolfie are compared to those acquired by a similar system developed by Siskind (1996), with results favorable to Wolfie. A second set of experiments demonstrates Wolfie's ability to scale to larger and more difficult, albeit artificially generated, corpora.
    In natural language acquisition, it is difficult to gather the annotated data needed for supervised learning; however, unannotated data is fairly plentiful. Active learning methods (Cohn, Atlas, & Ladner, 1994) attempt to select for annotation and training only the most informative examples, and therefore are potentially very useful in natural language applications. However, most results to date for active learning have only considered standard classification tasks. To reduce annotation effort while maintaining accuracy, we apply active learning to semantic lexicons. We show that active learning can significantly reduce the number of annotated examples required to achieve a given level of performance.

  44. Associative Anaphora Resolution: A Web-Based Approach
    Razvan Bunescu
    Proceedings of the EACL 2003 Workshop on the Computational Treatment of Anaphora, pp. 47-52, Budapest, Hungary, April 2003.
    Paper ID: 120

    We present a novel approach to solving definite descriptions in unrestricted text based on searching the web for a particular type of lexicosyntactic patterns. Using statistics on these patterns, we intend to recover the antecedents for a predefined subset of definite descriptions occurring in two types of anaphoric relations: identity anaphora and associative anaphora. Preliminary results obtained with this method are promising and compare well with other methods.

  45. Machine Learning
    Raymond J. Mooney
    Oxford Handbook of Computational Linguistics, R. Mitkov (Ed.), Oxford University Press, pp. 376-394, 2003.
    Paper ID: 119

    This chapter introduces symbolic machine learning in which decision trees, rules, or case-based classifiers are induced from supervised training examples. It describes the representation of knowledge assumed by each of these approaches and reviews basic algorithms for inducing such representations from annotated training examples and using the acquired knowledge to classify future instances. These techniques can be applied to learn knowledge required for a variety of problems in computational linguistics ranging from part-of-speech tagging and syntactic parsing to word-sense disambiguation and anaphora resolution. Applications to a variety of these problems are reviewed.

  46. Using Multiple Clause Constructors in Inductive Logic Programming for Semantic Parsing
    Lappoon R. Tang and Raymond J. Mooney
    Proceedings of the 12th European Conference on Machine Learning (ECML-2001), pp. 466-477, Freiburg, Germany, September 2001.
    Paper ID: 107

    In this paper, we explored a learning approach which combines different learning methods in inductive logic programming (ILP) to allow a learner to produce more expressive hypothese than that of each individual learner. Such a learning approach may be useful when the performance of the task depends on solving a large amount of classification problems and each has its own characteristics which may or may not fit a particular learning method. The task of sematnic parser acquisition in two different domains was attempted and preliminary results demonstrated that such an approach is promising.

  47. Automated Construction of Database Interfaces: Integrating Statistical and Relational Learning for Semantic Parsing
    Lappoon R. Tang and Raymond J. Mooney
    Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), pp. 133 - 141, Hong Kong, October 2000
    Paper ID: 102

    The development of natural language interfaces (NLI's) for databases has been a challenging problem in natural language processing (NLP) since the 1970's. The need for NLI's has become more pronounced due to the widespread access to complex databases now available through the Internet. A challenging problem for empirical NLP is the automated acquisition of NLI's from training examples. We present a method for integrating statistical and relational learning techniques for this task which exploits the strength of both approaches. Experimental results from three different domains suggest that such an approach is more robust than a previous purely logic-based approach.

  48. A Mutually Beneficial Integration of Data Mining and Information Extraction
    Un Yong Nahm and Raymond J. Mooney
    Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-2000), Austin, TX, pp. 627-632, July 2000.
    Paper ID: 100

    Text mining concerns applying data mining techniques to unstructured text. Information extraction (IE) is a form of shallow text understanding that locates specific pieces of data in natural language documents, transforming unstructured text into a structured database. This paper describes a system called DiscoTEX, that combines IE and data mining methodologies to perform text mining as well as improve the performance of the underlying extraction system. Rules mined from a database extracted from a corpus of texts are used to predict additional information to extract from future documents, thereby improving the recall of IE. Encouraging results are presented on applying these techniques to a corpus of computer job postings from an Internet newsgroup.

  49. Integrating Statistical and Relational Learning for Semantic Parsing: Applications to Learning Natural Language Interfaces for Databases
    Lappoon R. Tang
    Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, May 2000.
    50 pages
    Paper ID: 99

    The development of natural language interfaces (NLIs) for databases has been an interesting problem in natural language processing since the 70's. The need for NLIs has become more pronounced given the widespread access to complex databases now available through the Internet. However, such systems are difficult to build and must be tailored to each application. A current research topic involves using machine learning methods to automate the development of NLI's. This proposal presents a method for learning semantic parsers (systems for mapping natural language to logical form) that integrates logic-based and probabilistic methods in order to exploit the complementary strengths of these competing approaches. More precisely, an inductive logic programming (ILP) method, TABULATE, is developed for learning multiple models that are integrated via linear weighted combination to produce probabilistic models for statistical semantic parsing. Initial experimental results from three different domains suggest that an integration of statistical and logical approaches to semantic parsing can outperform a purely logical approach. Future research will further develop this integrated approach and demonstrate its ability to improve the automated development of NLI's.

  50. Automatic Construction of Semantic Lexicons for Learning Natural Language Interfaces
    Cynthia A. Thompson and Raymond J. Mooney
    Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), Orlando, FL, pp. 487-493, July 1999.
    Paper ID: 95

    This paper describes a system, Wolfie (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with semantic representations. The lexicon learned consists of words paired with meaning representations. Wolfie is part of an integrated system that learns to parse novel sentences into semantic representations, such as logical database queries. Experimental results are presented demonstrating Wolfie's ability to learn useful lexicons for a database interface in four different natural languages. The lexicons learned by Wolfie are compared to those acquired by a competing system developed by Siskind.

  51. Relational Learning of Pattern-Match Rules for Information Extraction
    Mary Elaine Califf and Raymond J. Mooney
    Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), Orlando, FL, pp. 328-334, July 1999.
    Paper ID: 94

    Information extraction is a form of shallow text processing that locates a specified set of relevant items in a natural-language document. Systems for this task require significant domain-specific knowledge and are time-consuming and difficult to build by hand, making them a good application for machine learning. This paper presents a system, Rapier, that takes pairs of sample documents and filled templates and induces pattern-match rules that directly extract fillers for the slots in the template. Rapier employs a bottom-up learning algorithm which incorporates techniques from several inductive logic programming systems and acquires unbounded patterns that include constraints on the words, part-of-speech tags, and semantic classes present in the filler and the surrounding text. We present encouraging experimental results on two domains.

  52. Learning for Semantic Interpretation: Scaling Up Without Dumbing Down
    Raymond J. Mooney
    Workshop Notes for the Workshop on Learning Language in Logic Bled, Slovenia, pp. 7-14, June 1999.
    Also appears in Learning Language in Logic, J. Cussens and S. Dzeroski (Eds.), pp. 57-66, Springer Verlag, Berlin, 2000.
    Paper ID: 93

    Most recent research in learning approaches to natural language have studied fairly "low-level" tasks such as morphology, part-of-speech tagging, and syntactic parsing. However, I believe that logical approaches may have the most relevance and impact at the level of semantic interpretation, where a logical representation of sentence meaning is important and useful. We have explored the use of inductive logic programming for learning parsers that map natural-language database queries into executable logical form. This work goes against the growing trend in computational linguistics of focusing on shallow but broad-coverage natural language tasks ("scaling up by dumbing down") and instead concerns using logic-based learning to develop narrower, domain-specific systems that perform relatively deep processing. I first present a historical view of the shifting emphasis of research on various tasks in natural language processing and then briefly review our own work on learning for semantic interpretation. I will then attempt to encourage others to study such problems and explain why I believe logical approaches have the most to offer at the level of producing semantic interpretations of complete sentences.

  53. Active Learning for Natural Language Parsing and Information Extraction
    Cynthia A. Thompson, Mary Elaine Califf and Raymond J. Mooney
    Nominated for Best Paper Award
    Proceedings of the Sixteenth International Machine Learning Conference (ICML-99) , Bled, Slovenia, pp. 406-414, June 1999.
    Paper ID: 92

    In natural language acquisition, it is difficult to gather the annotated data needed for supervised learning; however, unannotated data is fairly plentiful. Active learning methods attempt to select for annotation and training only the most informative examples, and therefore are potentially very useful in natural language applications. However, existing results for active learning have only considered standard classification tasks. To reduce annotation effort while maintaining accuracy, we apply active learning to two non-classification tasks in natural language processing: semantic parsing and information extraction. We show that active learning can significantly reduce the number of annotated examples required to achieve a given level of performance for these complex tasks.

  54. Semantic Lexicon Acquisition for Learning Natural Language Interfaces
    Cynthia Ann Thompson
    Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, December 1998.
    101 pages.
    Also appears as Technical Report AI 99-278, Artificial Intelligence Lab, University of Texas at Austin.
    Paper ID: 90

    A long-standing goal for the field of artificial intelligence is to enable computer understanding of human languages. A core requirement in reaching this goal is the ability to transform individual sentences into a form better suited for computer manipulation. This ability, called semantic parsing, requires several knowledge sources, such as a grammar, lexicon, and parsing mechanism.
    Building natural language parsing systems by hand is a tedious, error-prone undertaking. We build on previous research in automating the construction of such systems using machine learning techniques. The result is a combined system that learns semantic lexicons and semantic parsers from one common set of training examples. The input required is a corpus of sentence/representation pairs, where the representations are in the output format desired. A new system, Wolfie, learns semantic lexicons to be used as background knowledge by a previously developed parser acquisition system, Chill. The combined system is tested on a real world domain of answering database queries. We also compare this combination to a combination of Chill with a previously developed lexicon learner, demonstrating superior performance with our system. In addition, we show the ability of the system to learn to process natural languages other than English. Finally, we test the system on an alternate sentence representation, and on a set of large, artificial corpora with varying levels of ambiguity and synonymy.
    One difficulty in using machine learning methods for building natural language interfaces is building the required annotated corpus. Therefore, we also address this issue by using active learning to reduce the number of training examples required by both Wolfie and Chill. Experimental results show that the number of examples needed to reach a given level of performance can be significantly reduced with this method.

  55. Semantic Lexicon Acquisition for Learning Natural Language Interfaces
    Cynthia A. Thompson and Raymond J. Mooney
    Proceedings of the Sixth Workshop on Very Large Corpora, pp. 57-65, Montreal, Quebec, Canada, August 1998.
    TR AI 98-273, Artificial Intelligence Lab, University of Texas at Austin, May 1998.
    Paper ID: 89

    This paper describes a system, WOLFIE (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with representations of their meaning. The lexicon learned consists of words paired with meaning representations. WOLFIE is part of an integrated system that learns to parse novel sentences into semantic representations, such as logical database queries. Experimental results are presented demonstrating WOLFIE's ability to learn useful lexicons for a database interface in four different natural languages. The lexicons learned by WOLFIE are compared to those acquired by a competing system developed by Siskind (1996).

  56. Relational Learning Techniques for Natural Language Information Extraction
    Mary Elaine Califf
    Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, August 1998.
    142 pages.
    Also appears as Technical Report AI 98-276, Artificial Intelligence Lab, University of Texas at Austin.
    Paper ID: 88

    The recent growth of online information available in the form of natural language documents creates a greater need for computing systems with the ability to process those documents to simplify access to the information. One type of processing appropriate for many tasks is information extraction, a type of text skimming that retrieves specific types of information from text. Although information extraction systems have existed for two decades, these systems have generally been built by hand and contain domain specific information, making them difficult to port to other domains. A few researchers have begun to apply machine learning to information extraction tasks, but most of this work has involved applying learning to pieces of a much larger system. This dissertation presents a novel rule representation specific to natural language and a relational learning system, Rapier, which learns information extraction rules. Rapier takes pairs of documents and filled templates indicating the information to be extracted and learns pattern-matching rules to extract fillers for the slots in the template. The system is tested on several domains, showing its ability to learn rules for different tasks. Rapier's performance is compared to a propositional learning system for information extraction, demonstrating the superiority of relational learning for some information extraction tasks. Because one difficulty in using machine learning to develop natural language processing systems is the necessity of providing annotated examples to supervised learning systems, this dissertation also describes an attempt to reduce the number of examples Rapier requires by employing a form of active learning. Experimental results show that the number of examples required to achieve a given level of performance can be significantly reduced by this method.

  57. Relational Learning of Pattern-Match Rules for Information Extraction
    Mary Elaine Califf and Raymond J. Mooney
    Proceedings of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, pp. 6-11, Standford, CA, March 1998.
    Paper ID: 80

    Information extraction is a form of shallow text processing which locates a specified set of relevant items in natural language documents. Such systems can be useful, but require domain-specific knowledge and rules, and are time-consuming and difficult to build by hand, making infomation extraction a good testbed for the application of machine learning techniques to natural language processing. This paper presents a system, RAPIER, that takes pairs of documents and filled templates and induces pattern-match rules that directly extract fillers for the slots in the template. The learning algorithm incorporates techniques from several inductive logic programming systems and learns unbounded patterns that include constraints on the words and part-of-speech tags surrounding the filler. Encouraging results are presented on learning to extract information from computer job postings from the newsgroup misc.jobs.offered.

  58. Relational Learning Techniques for Natural Language Information Extraction
    Mary Elaine Califf
    Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, 1997.
    27 pages
    Paper ID: 78

    The recent growth of online information available in the form of natural language documents creates a greater need for computing systems with the ability to process those documents to simplify access to the information. One type of processing appropriate for many tasks is information extraction, a type of text skimming that retrieves specific types of information from text. Although information extraction systems have existed for two decades, these systems have generally been built by hand and contain domain specific information, making them difficult to port to other domains. A few researchers have begun to apply machine learning to information extraction tasks, but most of this work has involved applying learning to pieces of a much larger system. This paper presents a novel rule representation specific to natural language and a learning system, RAPIER, which learns information extraction rules. RAPIER takes pairs of documents and filled templates indicating the information to be extracted and learns patterns to extract fillers for the slots in the template. This proposal presents initial results on a small corpus of computer-related job postings with a preliminary version of RAPIER. Future research will involve several enhancements to RAPIER as well as more thorough testing on several domains and extension to additional natural language processing tasks. We intend to extend the rule representation and algorithm to allow for more types of constraints than are currently supported. We also plan to incorporate active learning, or sample selection, methods, specifically query by committee, into RAPIER. These methods have the potential to substantially reduce the amount of annotation required. We will explore the issue of distinguishing relevant and irrelevant messages, since currently RAPIER only extracts from the any messages given to it, assuming that all are relevant. We also intend to run much larger tests with RAPIER on multiple domains including the terrorism domain from the third and fourth Message Uncderstanding Conferences, which will allow comparison against other systems. Finally, we plan to demonstrate the generality of RAPIER`s representation and algorithm by applying it to other natural language processing tasks such as word sense disambiguation.

  59. Applying ILP-based Techniques to Natural Language Information Extraction: An Experiment in Relational Learning
    Mary Elaine Califf and Raymond J. Mooney
    Workshop Notes of the IJCAI-97 Workshop on Frontiers of Inductive Logic Programming, pp. 7-11, Nagoya, Japan, August 1997.
    Paper ID: 76

    Information extraction systems process natural language documents and locate a specific set of relevant items. Given the recent success of empirical or corpus-based approaches in other areas of natural language processing, machine learning has the potential to significantly aid the development of these knowledge-intensive systems. This paper presents a system, RAPIER, that takes pairs of documents and filled templates and induces pattern-match rules that directly extract fillers for the slots in the template. The learning algorithm incorporates techniques from several inductive logic programming systems and learns unbounded patterns that include constraints on the words and part-of-speech tags surrounding the filler. Encouraging results are presented on learning to extract information from computer job postings from the newsgroup misc.jobs.offered.

  60. Learning to Parse Natural Language Database Queries into Logical Form
    Cynthia A. Thompson Raymond J. Mooney, and Lappoon R. Tang
    Proceedings of the ML-97 Workshop on Automata Induction, Grammatical Inference, and Language Acquisition, Nashville, TN, July 1997.
    Paper ID: 75

    For most natural language processing tasks, a parser that maps sentences into a semantic representation is significantly more useful than a grammar or automata that simply recognizes syntactically well-formed strings. This paper reviews our work on using inductive logic programming methods to learn deterministic shift-reduce parsers that translate natural language into a semantic representation. We focus on the task of mapping database queries directly into executable logical form. An overview of the system is presented followed by recent experimental results on corpora of Spanish geography queries and English job-search queries.

  61. Relational Learning of Pattern-Match Rules for Information Extraction
    Mary Elaine Califf and Raymond J. Mooney
    Proceedings of the ACL Workshop on Natural Language Learning, pp. 9-15, Madrid, Spain, July 1997.
    Paper ID: 74

    Information extraction systems process natural language documents and locate a specific set of relevant items. Given the recent success of empirical or corpus-based approaches in other areas of natural language processing, machine learning has the potential to significantly aid the development of these knowledge-intensive systems. This paper presents a system, RAPIER, that takes pairs of documents and filled templates and induces pattern-match rules that directly extract fillers for the slots in the template. The learning algorithm incorporates techniques from several inductive logic programming systems and learns unbounded patterns that include constraints on the words and part-of-speech tags surrounding the filler. Encouraging results are presented on learning to extract information from computer job postings from the newsgroup misc.jobs.offered.

  62. Learning Parse and Translation Decisions From Examples With Rich Context
    Ulf Hermjakob and Raymond J. Mooney
    Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL'97/EACL'97), pp. 482-489, July 1997.
    Paper ID: 73

    This paper presents a knowledge and context-based system for parsing and translating natural language and evaluates it on sentences from the Wall Street Journal. Applying machine learning techniques, the system uses parse action examples acquired under supervision to generate a deterministic shift-reduce parser in the form of a decision structure. It relies heavily on context, as encoded in features which describe the morpholgical, syntactical, semantical and other aspects of a given parse state.

  63. Learning Parse and Translation Decisions From Examples With Rich Context
    Ulf Hermjakob
    Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, May 1997.
    175 pages.
    Also appears as Technical Report AI 97-12, Artifical Intelligence Lab, University of Texas at Austin.
    Paper ID: 72

    The parsing of unrestricted text, with its enormous lexical and structural ambiguity, still poses a great challenge in natural language processing. The difficulties with traditional approaches, which try to master the complexity of parse grammars with hand-crafted rules, have led to a trend towards more empirical techniques.

    We therefore propose a system for parsing and translating natural language that learns from examples and uses some background knowledge.
    As our parsing model we choose a deterministic shift-reduce type parser that integrates part-of-speech tagging and syntactic and semantic processing, which not only makes parsing very efficient, but also assures transparency during the supervised example acquisition.
    Applying machine learning techniques, the system uses parse action examples to generate a parser in the form of a decision structure, a generalization of decision trees.
    To learn good parsing and translation decisions, our system relies heavily on context, as encoded in currently 205 features describing the morphological, syntactical and semantical aspects of a given parse state. Compared with recent probabilistic systems that were trained on 40,000 sentences, our system relies on more background knowledge and a deeper analysis, but radically fewer examples, currently 256 sentences.

    We test our parser on lexically limited sentences from the Wall Street Journal and achieve accuracy rates of 89.8% for labeled precision, 98.4% for part of speech tagging and 56.3% of test sentences without any crossing brackets. Machine translations of 32 Wall Street Journal sentences to German have been evaluated by 10 bilingual volunteers and been graded as 2.4 on a 1.0 (best) to 6.0 (worst) scale for both grammatical correctness and meaning preservation. The translation quality was only minimally better (2.2) when starting each translation with the correct parse tree, which indicates that the parser is quite robust and that its errors have only a moderate impact on final trans- lation quality. These parsing and translation results already compare well with other systems and, given the relatively small training set and amount of overall knowledge used so far, the results suggest that our system Contex can break previous accuracy ceilings when scaled up further.

  64. An Inductive Logic Programming Method for Corpus-based Parser Construction
    John M. Zelle and Raymond J. Mooney
    Unpublished Technical Note, January 1997
    Paper ID: 71

    Empirical methods for building natural language systems has become an important area of research in recent years. Most current approaches are based on propositional learning algorithms and have been applied to the problem of acquiring broad-coverage parsers for relatively shallow (syntactic) representations. This paper outlines an alternative empirical approach based on techniques from a subfield of machine learning known as Inductive Logic Programming (ILP). ILP algorithms, which learn relational (first-order) rules, are used in a parser acquisition system called CHILL that learns rules to control the behavior of a traditional shift-reduce parser. Using this approach, CHILL is able to learn parsers for a variety of different types of analyses, from traditional syntax trees to more meaning-oriented case-role and database query forms. Experimental evidence shows that CHILL performs comparably to propositional learning systems on similar tasks, and is able to go beyond the broad-but-shallow paradigm and learn mappings directly from sentences into useful semantic representations. In a complete database-query application, parsers learned by CHILL outperform an existing hand-crafted system, demonstrating the promise of empricial techniques for automating the construction certain NLP systems.

  65. Semantic Lexicon Acquisition for Learning Parsers
    Cynthia A. Thompson and Raymond J. Mooney
    Unpublished Technical Note, January 1997
    Paper ID: 69

    This paper describes a system, WOLFIE (WOrd Learning From Interpreted Examples), that learns a semantic lexicon from a corpus of sentences paired with representations of their meaning. The lexicon learned consists of words paired with representations of their meaning, and allows for both synonymy and polysemy. WOLFIE is part of an integrated system that learns to parse novel sentences into their meaning representations. Experimental results are presented that demonstrate WOLFIE's ability to learn useful lexicons for a realistic domain. The lexicons learned by WOLFIE are also compared to those learned by another lexical acquisition system, that of Siskind (1996).

  66. Inductive Logic Programming for Natural Language Processing
    Raymond J. Mooney
    Inductive Logic Programming: Selected Papers from the 6th International Workshop, S. Muggleton (Ed.), pp.3-22, Springer Verlag, Berlin, 1997.
    Also appears in Proceedings of the 6th International Inductive Logic Programming Workshop, pp. 205-224, Stockholm, Sweden, August 1996.
    Paper ID: 68

    This paper reviews our recent work on applying inductive logic programming to the construction of natural language processing systems. We have developed a system, CHILL, that learns a parser from a training corpus of parsed sentences by inducing heuristics that control an initial overly-general shift-reduce parser. CHILL learns syntactic parsers as well as ones that translate English database queries directly into executable logical form. The ATIS corpus of airline information queries was used to test the acquisition of syntactic parsers, and CHILL performed competitively with recent statistical methods. English queries to a small database on U.S. geography were used to test the acquisition of a complete natural language interface, and the parser that CHILL acquired was more accurate than an existing hand-coded system. The paper also includes a discussion of several issues this work has raised regarding the capabilities and testing of ILP systems as well as a summary of our current research directions.

  67. Learning to Parse Database Queries using Inductive Logic Programming
    John M. Zelle and Raymond J. Mooney
    Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), pp. 1050-1055, Portland, OR, August 1996.
    Paper ID: 66

    This paper presents recent work using the CHILL parser acquisition system to automate the construction of a natural-language interface for database queries. CHILL treats parser acquisition as the learning of search-control rules within a logic program representing a shift-reduce parser and uses techniques from Inductive Logic Programming to learn relational control knowledge. Starting with a general framework for constructing a suitable logical form, CHILL is able to train on a corpus comprising sentences paired with database queries and induce parsers that map subsequent sentences directly into executable queries. Experimental results with a complete database-query application for U.S. geography show that CHILL is able to learn parsers that outperform a pre-existing, hand-crafted counterpart. These results demonstrate the ability of a corpus-based system to produce more than purely syntactic representations. They also provide direct evidence of the utility of an empirical approach at the level of a complete natural language application.

  68. Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning
    Raymond J. Mooney
    Proceedings of the 1996 Conference on Empirical Methods in Natural Language Processing (EMNLP-96), pp. 82-91, Philadelphia, PA, May 1996.
    Paper ID: 62

    This paper describes an experimental comparison of seven different learning algorithms on the problem of learning to disambiguate the meaning of a word from context. The algorithms tested include statistical, neural-network, decision-tree, rule-based, and case-based classification techniques. The specific problem tested involves disambiguating six senses of the word "line" using the words in the current and proceeding sentence as context. The statistical and neural-network methods perform the best on this particular problem and we discuss a potential reason for this observed difference. We also discuss the role of bias in machine learning and its importance in explaining performance differences observed on specific problems.

  69. Corpus-Based Lexical Acquisition For Semantic Parsing
    Cynthia Thompson
    Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, February 1996.
    38 pages
    Paper ID: 57

    Building accurate and efficient natural language processing (NLP) systems is an important and difficult problem. There has been increasing interest in automating this process. The lexicon, or the mapping from words to meanings, is one component that is typically difficult to update and that changes from one domain to the next. Therefore, automating the acquisition of the lexicon is an important task in automating the acquisition of NLP systems. This proposal describes a system, WOLFIE (WOrd Learning From Interpreted Examples), that learns a lexicon from input consisting of sentences paired with representations of their meanings. Preliminary experimental results show that this system can learn correct and useful mappings. The correctness is evaluated by comparing a known lexicon to one learned from the training input. The usefulness is evaluated by examining the effect of using the lexicon learned by WOLFIE to assist a parser acquisition system, where previously this lexicon had to be hand-built. Future work in the form of extensions to the algorithm, further evaluation, and possible applications is discussed.

  70. Lexical Acquisition: A Novel Machine Learning Problem
    Cynthia A. Thompson and Raymond J. Mooney
    Technical Report, Artificial Intelligence Lab, University of Texas at Austin, January 1996.
    Paper ID: 56

    This paper defines a new machine learning problem to which standard machine learning algorithms cannot easily be applied. The problem occurs in the domain of lexical acquisition. The ambiguous and synonymous nature of words causes the difficulty of using standard induction techniques to learn a lexicon. Additionally, negative examples are typically unavailable or difficult to construct in this domain. One approach to solve the lexical acquisition problem is presented, along with preliminary experimental results on an artificial corpus. Future work includes extending the algorithm and performing tests on a more realistic corpus.

  71. Comparative Results on Using Inductive Logic Programming for Corpus-based Parser Construction
    John M. Zelle and Raymond J. Mooney
    Symbolic, Connectionist, and Statistical Approaches to Learning for Natural Language Processing, S. Wermter, E. Riloff and G. Scheler (Eds.), Spring Verlag, 1996.
    Paper ID: 54

    This paper presents results from recent experiments with CHILL, a corpus-based parser acquisition system. CHILL treats language acquisition as the learning of search-control rules within a logic program. Unlike many current corpus-based approaches that use statistical learning algorithms, CHILL uses techniques from inductive logic programming (ILP) to learn relational representations. CHILL is a very flexible system and has been used to learn parsers that produce syntactic parse trees, case-role analyses, and executable database queries. The reported experiments compare CHILL's performance to that of a more naive application of ILP to parser acquisition. The results show that ILP techniques, as employed in CHILL, are a viable alternative to statistical methods and that the control-rule framework is fundamental to CHILL's success.

  72. Learning the Past Tense of English Verbs Using Inductive Logic Programming
    Raymond J. Mooney and Mary Elaine Califf
    Symbolic, Connectionist, and Statistical Approaches to Learning for Natural Language Processing, S. Wermter, E. Riloff and G. Scheler (Eds.), Spring Verlag, 1996.
    Paper ID: 53

    This paper presents results on using a new inductive logic programming method called FOIDL to learn the past tense of English verbs. The past tense task has been widely studied in the context of the symbolic/connectionist debate. Previous papers have presented results using various neural-network and decision-tree learning methods. We have developed a technique for learning a special type of Prolog program called a first-order decision list, defined as an ordered list of clauses each ending in a cut. FOIDL is based on FOIL (Quinlan, 1990) but employs intensional background knowledge and avoids the need for explicit negative examples. It is particularly useful for problems that involve rules with specific exceptions, such as the past-tense task. We present results showing that FOIDL learns a more accurate past-tense generator from significantly fewer examples than all other previous methods.

  73. Inducing Logic Programs without Explicit Negative Examples
    John M. Zelle, Cynthia A. Thompson, Mary Elaine Califf, and Raymond J. Mooney
    Proceedings of the Fifth International Workshop on Inductive Logic Programming, pp. 403-416, Leuven, Belguim, Sepetember 1995.
    Paper ID: 50

    This paper presents a method for learning logic programs without explicit negative examples by exploiting an assumption of output completeness. A mode declaration is supplied for the target predicate and each training input is assumed to be accompanied by all of its legal outputs. Any other outputs generated by an incomplete program implicitly represent negative examples; however, large numbers of ground negative examples never need to be generated. This method has been incorporated into two ILP systems, CHILLIN and IFOIL, both of which use intensional background knowledge. Tests on two natural language acquisition tasks, case-role mapping and past-tense learning, illustrate the advantages of the approach.

  74. Using Inductive Logic Programming to Automate the Construction of Natural Language Parsers
    John M. Zelle
    Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, August 1995.
    204 pages
    Also appears as Technical Report AI 96-249, Department of Computer Sciences, University of Texas at Austin
    Paper ID: 48

    Designing computer systems to understand natural language input is a difficult task. In recent years there has been considerable interest in corpus-based methods for constructing natural language parsers. These empirical approaches replace hand-crafted grammars with linguistic models acquired through automated training over language corpora. A common thread among such methods to date is the use of propositional or probablistic representations for the learned knowledge. This dissertation presents an alternative approach based on techniques from a subfield of machine learning known as inductive logic programming (ILP). ILP, which investigates the learning of relational (first-order) rules, provides an empirical method for acquiring knowledge within traditional, symbolic parsing frameworks.

    This dissertation details the architecture, implementation and evaluation of CHILL a computer system for acquiring natural language parsers by training over corpora of parsed text. CHILL treats language acquisition as the learning of search-control rules within a logic program that implements a shift-reduce parser. Control rules are induced using a novel ILP algorithm which handles difficult issues arising in the induction of search-control heuristics. Both the control-rule framework and the induction algorithm are crucial to CHILL's success.

    The main advantage of CHILL over propositional counterparts is its flexibility in handling varied representations. CHILL has produced parsers for various analyses including case-role mapping, detailed syntactic parse trees, and a logical form suitable for expressing first-order database queries. All of these tasks are accomplished within the same framework, using a single, general learning method that can acquire new syntactic and semantic categories for resolving ambiguities.

    Experimental evidence from both aritificial and real-world corpora demonstrate that CHILL learns parsers as well or better than previous artificial neural network or probablistic approaches on comparable tasks. In the database query domain, which goes beyond the scope of previous empirical approaches, the learned parser outperforms an existing hand-crafted system. These results support the claim that ILP techniques as implemented in CHILL represent a viable alternative with significant potential advantages over neural-network, propositional, and probablistic approaches to empirical parser construction.

  75. A Comparison of Two Methods Employing Inductive Logic Programming for Corpus-based Parser Constuction
    John M. Zelle and Raymond J. Mooney
    Working Notes of the IJCAI-95 Workshop on New Approaches to Learning for Natural Language Processing, pp.79-86, Montreal, Quebec, Canada, August 1995.
    Paper ID: 47

    This paper presents results from recent experiments with CHILL, a corpus-based parser acquisition system. CHILL treats grammar acquisition as the learning of search-control rules within a logic program. Unlike many current corpus-based approaches that use propositional or probabilistic learning algorithms, CHILL uses techniques from inductive logic programming (ILP) to learn relational representations. The reported experiments compare CHILL's performance to that of a more naive application of ILP to parser acquisition. The results show that ILP techniques, as employed in CHILL, are a viable alternative to propositional methods and that the control-rule framework is fundamental to CHILL's success.

  76. Acquisition of a Lexicon from Semantic Representations of Sentences
    Cynthia A. Thompson
    Proceedings of the 33rd Annual Meeting of the Association of Computational Linguistics (ACL-95), pp. 335-337, Boston, MA, July 1995.
    Paper ID: 45

    A system, WOLFIE, that acquires a mapping of words to their semantic representation is presented and a preliminary evaluation is performed. Tree least general generalizations (TLGGs) of the representations of input sentences are performed to assist in determining the representations of individual words in the sentences. The best guess for a meaning of a word is the TLGG which overlaps with the highest percentage of sentence representations in which that word appears. Some promising experimental results on a non-artificial data set are presented.

  77. Induction of First-Order Decision Lists: Results on Learning the Past Tense of English Verbs
    Raymond J. Mooney and Mary Elaine Califf
    Journal of Artificial Intelligence Research, 3 (1995) pp. 1-24.
    Paper ID: 44

    This paper presents a method for inducing logic programs from examples that learns a new class of concepts called first-order decision lists, defined as ordered lists of clauses each ending in a cut. The method, called FOIDL, is based on FOIL but employs intensional background knowledge and avoids the need for explicit negative examples. It is particularly useful for problems that involve rules with specific exceptions, such as learning the past-tense of English verbs, a task widely studied in the context of the symbolic/connectionist debate. FOIDL is able to learn concise, accurate programs for this problem from significantly fewer examples than previous methods (both connectionist and symbolic).

  78. Inducing Deterministic Prolog Parsers From Treebanks: A Machine Learning Approach
    John M. Zelle and Raymond J. Mooney
    Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), pp. 748-753, Seattle, WA, July 1994.
    Paper ID: 35

    This paper presents a method for constructing deterministic, context-sensitive, Prolog parsers from corpora of parsed sentences. Our approach uses recent machine learning methods for inducing Prolog rules from examples (inductive logic programming). We discuss several advantages of this method compared to recent statistical methods and present results on learning complete parsers from portions of the ATIS corpus.

  79. Learning Semantic Grammars With Constructive Inductive Logic Programming
    John M. Zelle and Raymond J. Mooney
    Proceedings of the Eleventh National Conference of the American Association for Artificial Intelligence (AAAI-93), pp. 817-822, Washington, D.C. July 1993.
    Paper ID: 25

    Automating the construction of semantic grammars is a difficult and interesting problem for machine learning. This paper shows how the semantic-grammar acquisition problem can be viewed as the learning of search-control heuristics in a logic program. Appropriate control rules are learned using a new first-order induction algorithm that automatically invents useful syntactic and semantic categories. Empirical results show that the learned parsers generalize well to novel sentences and out-perform previous approaches based on connectionist techniques.

  80. Learning Search-Control Heuristics for Logic Programs: Applications to Speedup Learning and Language Acquisition
    John M. Zelle
    Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, March 1993.
    36 pages
    Paper ID: 21

    This paper presents a general framework, learning search-control heuristics for logic programs, which can be used to improve both the efficiency and accuracy of knowledge-based systems expressed as definite-clause logic programs. The approach combines techniques of explanation-based learning and recent advances in inductive logic programming to learn clause-selection heuristics that guide program execution. Two specific applications of this framework are detailed: dynamic optimization of Prolog programs (improving efficiency) and natural language acquisition (improving accuracy). In the area of program optimization, a prototype system, DOLPHIN, is able to transform some intractable specifications into polynomial-time algorithms, and outperforms competing approaches in several benchmark speedup domains. A prototype language acquisition system, CHILL, is also described. It is capable of automatically acquiring semantic grammars, which uniformly incorprate syntactic and semantic constraints to parse sentences into case-role representations. Initial experiments show that this approach is able to construct accurate parsers which generalize well to novel sentences and significantly outperform previous approaches to learning case-role mapping based on connectionist techniques. Planned extensions of the general framework and the specific applications as well as plans for further evaluation are also discussed.

  81. Learning Plan Schemata From Observation: Explanation-Based Learning for Plan Recognition
    Raymond J. Mooney
    Cognitive Science, 14, 4 (1990), pp. 483--509.
    Paper ID: 1

    This article discusses how explanation-based learning of plan schemata from observation can improve performance of plan recognition. The GENESIS program is presented as an implemented system for narrative text understanding that learns schemata and improves its performance. Learned schemata allow GENESIS to use schema-based understanding techniques when interpreting events and thereby avoid the expensive search associated with plan-based understanding. Learned schemata also function as new concepts that can be used to cluster examples and index events in memory. In addition. experiments are reviewed which demonstrate that human subjects, like GENESIS, can learn a schema by observing, explaining, and generalizing a single specific instance presented in a narrative.

  82. Integrated Learning of Words and their Underlying Concepts
    Raymond J. Mooney
    In Proceedings of the Ninth Annual Conference of the Cognitive Science Society (CogSci), pp. 974-978, Seattle, WA, July 1987.
    Paper ID: ,

    Models of learning word meanings have generally assumed prior knowledge of the concepts to which the words refer. However, novel natural language text or discourse often presents both unknown concepts and words which refer to these concepts. Also, developmental data suggests that the learning of words and their concepts frequently occurs concurrently instead of concept learning proceeding word learning. This paper presents an integrated computational model for acquiring both word meanings and their underlying concepts concurrently. This model is implemented as a word learning component added to the GENESIS explanation-based learning schema acquisition system for narrative understanding. A detailed example is described in which GENESIS learns provisional definitions for the words

  83. Learning Schemata for Natural Language Processing
    Raymond J. Mooney and Gerald F. DeJong
    In Proceedings of the Ninth International Joint Conference on Artificial Intelligence (IJCAI), pp. 681-687, Los Angeles, CA, August 1985.
    Paper ID: 205

    This paper describes a natural language system which improves its own performance through learning. The system processes short English narratives and is able to acquire, from a single narrative, a new schema for a stereotypical set of actions. During the understanding process, the system attempts to construct explanations for characters' actions in terms of the goals their actions were meant to achieve. When the system observes that a character has achieved an interesting goal in a novel way, it generalizes the set of actions they used to achieve this goal into a new schema. The generalization process is a knowledge-based analysis of the causal structure of the narrative which removes unnecessary details while maintaining the validity of the causal explanation. The resulting generalized set of actions is then stored as a new schema and used by the system to correctly process narratives which were previously beyond its capabilities.


mooney@cs.utexas.edu