Machine Learning Research Group | University of Texas

Publications: 1998

Hide abstracts

Semantic Lexicon Acquisition for Learning Natural Language Interfaces
[Details] [PDF]
Cynthia Ann Thompson
PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, December 1998. 101 pages. Also appears as Technical Report AI 99-278, Artificial Intelligence Lab, University of Texas at Austin.
A long-standing goal for the field of artificial intelligence is to enable computer understanding of human languages. A core requirement in reaching this goal is the ability to transform individual sentences into a form better suited for computer manipulation. This ability, called semantic parsing, requires several knowledge sources, such as a grammar, lexicon, and parsing mechanism.
Building natural language parsing systems by hand is a tedious, error-prone undertaking. We build on previous research in automating the construction of such systems using machine learning techniques. The result is a combined system that learns semantic lexicons and semantic parsers from one common set of training examples. The input required is a corpus of sentence/representation pairs, where the representations are in the output format desired. A new system, Wolfie, learns semantic lexicons to be used as background knowledge by a previously developed parser acquisition system, Chill. The combined system is tested on a real world domain of answering database queries. We also compare this combination to a combination of Chill with a previously developed lexicon learner, demonstrating superior performance with our system. In addition, we show the ability of the system to learn to process natural languages other than English. Finally, we test the system on an alternate sentence representation, and on a set of large, artificial corpora with varying levels of ambiguity and synonymy.
One difficulty in using machine learning methods for building natural language interfaces is building the required annotated corpus. Therefore, we also address this issue by using active learning to reduce the number of training examples required by both Wolfie and Chill. Experimental results show that the number of examples needed to reach a given level of performance can be significantly reduced with this method.
ML ID: 90
Semantic Lexicon Acquisition for Learning Natural Language Interfaces
[Details] [PDF]
Cynthia A. Thompson and Raymond J. Mooney
In Proceedings of the Sixth Workshop on Very Large Corpora, Montreal, Quebec, Canada, August 1998. Also available as TR AI 98-273, Artificial Intelligence Lab, University of Texas at Austin, May 1998.
This paper describes a system, WOLFIE (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with representations of their meaning. The lexicon learned consists of words paired with meaning representations. WOLFIE is part of an integrated system that learns to parse novel sentences into semantic representations, such as logical database queries. Experimental results are presented demonstrating WOLFIE's ability to learn useful lexicons for a database interface in four different natural languages. The lexicons learned by WOLFIE are compared to those acquired by a competing system developed by Siskind (1996).
ML ID: 89
Relational Learning Techniques for Natural Language Information Extraction
[Details] [PDF]
Mary Elaine Califf
PhD Thesis, Department of Computer Sciences, University of Texas, Austin, TX, August 1998. 142 pages. Also appears as Artificial Intelligence Laboratory Technical Report AI 98-276.
The recent growth of online information available in the form of natural language documents creates a greater need for computing systems with the ability to process those documents to simplify access to the information. One type of processing appropriate for many tasks is information extraction, a type of text skimming that retrieves specific types of information from text. Although information extraction systems have existed for two decades, these systems have generally been built by hand and contain domain specific information, making them difficult to port to other domains. A few researchers have begun to apply machine learning to information extraction tasks, but most of this work has involved applying learning to pieces of a much larger system. This dissertation presents a novel rule representation specific to natural language and a relational learning system, Rapier, which learns information extraction rules. Rapier takes pairs of documents and filled templates indicating the information to be extracted and learns pattern-matching rules to extract fillers for the slots in the template. The system is tested on several domains, showing its ability to learn rules for different tasks. Rapier's performance is compared to a propositional learning system for information extraction, demonstrating the superiority of relational learning for some information extraction tasks. Because one difficulty in using machine learning to develop natural language processing systems is the necessity of providing annotated examples to supervised learning systems, this dissertation also describes an attempt to reduce the number of examples Rapier requires by employing a form of active learning. Experimental results show that the number of examples required to achieve a given level of performance can be significantly reduced by this method.
ML ID: 88
Theory Refinement for Bayesian Networks with Hidden Variables
[Details] [PDF]
Sowmya Ramachandran and Raymond J. Mooney
In Proceedings of the Fifteenth International Conference on Machine Learning (ICML-98), 454--462, Madison, WI, July 1998.
While there has been a growing interest in the problem of learning Bayesian networks from data, no technique exists for learning or revising Bayesian networks with Hidden variables (i.e. variables not represented in the data), that has been shown to be efficient, effective, and scalable through evaluation on real data. The few techniques that exist for revising such networks perform a blind search through a large spaces of revisons, and are therefore computationally expensive. This paper presents BANNER, a technique for using data to revise a given bayesian network with noisy-or and noisy-and nodes, to improve its classification accuracy. The initial network can be derived directly from a logical theory expresssed as propositional rules. BANNER can revise networks with hidden variables, and add hidden variables when necessary. Unlike previous approaches, BANNER employs mechanisms similar to logical theory refinement techniques for using the data to focus the search for effective modifications. Experiments on real-world problems in the domain of molecular biology demonstrate that BANNER can effectively revise fairly large networks to significantly improve their accuracies.
ML ID: 87
Book Recommending Using Text Categorization with Extracted Information
[Details] [PDF]
Raymond J. Mooney, Paul N. Bennett, and Loriene Roy
In Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98)"-REC-WKSHP98, year="1998, 70-74, Madison, WI, 1998.
Content-based recommender systems suggest documents, items, and services to users based on learning a profile of the user from rated examples containing information about the given items. Text categorization methods are very useful for this task but generally rely on unstructured text. We have developed a book-recommending system that utilizes semi-structured information about items gathered from the web using simple information extraction techniques. Initial experimental results demonstrate that this approach can produce fairly accurate recommendations.
ML ID: 86
Text Categorization Through Probabilistic Learning: Applications to Recommender Systems
[Details] [PDF]
Paul N. Bennett
1998. Honors thesis, Department of Computer Sciences, The University of Texas at Austin.
With the growth of the World Wide Web, recommender systems have received an increasing amount of attention. Many recommender systems in use today are based on collaborative filtering. This project has focused on LIBRA, a content-based book recommending system. By utilizing text categorization methods and the information available for each book, the system determines a user profile which is used as the basis of recommendations made to the user. Instead of the bag-of-words approach used in many other statistical text categorization approaches, LIBRA parses each text sample into a semi-structured representation. We have used standard Machine Learning techniques to analyze the performance of several algorithms on this learning task. In addition, we analyze the utility of several methods of feature construction and selection (i.e. methods of choosing the representation of an item that the learning algorithm actually uses). After analyzing the system we conclude that good recommendations are produced after a relatively small number of training examples. We also conclude that the feature selection method tested does not improve the performance of these algorithms in any systematic way, though the results indicate other feature selection methods may prove useful. Feature construction, however, while not providing a large increase in performance with the particular construction methods used here, holds promise of providing performance improvements for the algorithms investigated. This text assumes only minor familiarity with concepts of artificial intelligence and should be readable by the upper division computer science undergraduate familiar with basic concepts of probability theory and set theory.
ML ID: 85
Theory Refinement of Bayesian Networks with Hidden Variables
[Details] [PDF]
Sowmya Ramachandran and Raymond J. Mooney
PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, May 1998. 139 pages. Also appears as Technical Report AI 98-265, Artificial Intelligence Lab, University of Texas at Austin.
Research in theory refinement has shown that biasing a learner with initial, approximately correct knowledge produces more accurate results than learning from data alone. While techniques have been developed to revise logical and connectionist representations, little has been done to revise probabilistic representations. Bayesian networks are well-established as a sound formalism for representing and reasoning with probabilistic knowledge, and are widely used. There has been a growing interest in the problem of learning Bayesian networks from data. However, there is no existing technique for learning or revising Bayesian networks with hidden variables (i.e. variables not represented in the data) that has been shown to be efficient, effective, and scalable through evaluation on real data. The few techniques that exist for revising such networks perform a blind search through a large space of revisions, and are therefore computationally expensive. This dissertation presents Banner, a technique for using data to revise a giv en Bayesian network with Noisy-Or and Noisy-And nodes, to improve its classification accuracy. Additionally, the initial netwrk can be derived directly from a logical theory expressed as propositional Horn-clause rules. Banner can revise networks with hidden variables, and add hidden variables when necessary. Unlike previous approaches to this problem, Banner employs mechanisms similar to those used in logical theory refinement techniques for using the data to focus the search for effective modifications to the network. It can also be used to learn networks with hidden variables from data alone. We also introduce Banner-Pr, a technique for revising the parameters of a Bayesian network with Noisy-Or/And nodes, that directly exploits the computational efficiency afforded by these models. Experiments on several real-world learning problems in domains such as molecular biology and intelligent tutoring systems demonstrate that Banner can effectively and efficiently revise networks to significantly improve their accuracies, and thus learn highly accurate classifiers. Comparisons with the Naive Bayes algorithm show that using the theory refinement approach gives Banner a substantial edge over learning from data alone. We also show that Banner-Pr converges faster and produces more accurate classifiers than an existing algorithm for learning the parameters of a network.
ML ID: 84
An Experimental Comparison of Genetic Programming and Inductive Logic Programming on Learning Recursive List Functions
[Details] [PDF]
Lappoon R. Tang, Mary Elaine Califf, and Raymond J. Mooney
Technical Report AI 98-271, Artificial Intelligence Lab, University of Texas at Austin, May 1998.
This paper experimentally compares three approaches to program induction: inductive logic programming (ILP), genetic programming (GP), and genetic logic programming (GLP) (a variant of GP for inducing Prolog programs). Each of these methods was used to induce four simple, recursive, list-manipulation functions. The results indicate that ILP is the most likely to induce a correct program from small sets of random examples, while GP is generally less accurate. GLP performs the worst, and is rarely able to induce a correct program. Interpretations of these results in terms of differences in search methods and inductive biases are presented.
ML ID: 83
Advantages of Decision Lists and Implicit Negatives in Inductive Logic Programming
[Details] [PDF]
Mary Elaine Califf and Raymond J. Mooney
New Generation Computing, 16(3):263-281, 1998.
This paper demonstrates the capabilities of FOIDL, an inductive logic programming (ILP) system whose distinguishing characteristics are the ability to produce first-order decision lists, the use of an output completeness assumption as a substitute for negative examples, and the use of intensional background knowledge. The development of FOIDL was originally motivated by the problem of learning to generate the past tense of English verbs; however, this paper demonstrates its superior performance on two different sets of benchmark ILP problems. Tests on the finite element mesh design problem show that FOIDL's decision lists enable it to produce generally more accurate results than a range of methods previously applied to this problem. Tests with a selection of list-processing problems from Bratko's introductory Prolog text demonstrate that the combination of implicit negatives and intensionality allow FOIDL to learn correct programs from far fewer examples than FOIL.
ML ID: 82
Using Multi-Strategy Learning to Improve Planning Efficiency and Quality
[Details] [PDF]
Tara A. Estlin
PhD Thesis, Department of Computer Sciences, The University of Texas at Austin, Austin, TX, 1998.
Artificial intelligence planning systems have become an important tool for automating a wide variety of tasks. However, even the most current planning algorithms suffer from two major problems. First, they often require infeasible amounts of computation time to solve problems in most domains. And second, they are not guaranteed to return the best solution to a planning problem, and in fact can sometimes return very low-quality solutions. One way to address these problems is to provide a planning system with domain-specific control knowledge, which helps guide the planner towards more promising search paths. Machine learning techniques enable a planning system to automatically acquire search-control knowledge for different applications. A considerable amount of planning and learning research has been devoted to acquiring rules that improve planning efficiency, also known as speedup learning. Much less work has been down in learning knowledge to improve the quality of plans, even though this is an essential feature for many real-world planning systems. Furthermore, even less research has been done in acquiring control knowledge to improve both these metrics.
The learning system presented in this dissertation, called SCOPE, is a unique approach to learning control knowledge for planning. SCOPE learns domain-specific control rules for a planner that improve both planning efficiency and plan quality, and it is one of the few systems that can learn control knowledge for partial-order planning. SCOPE's architecture integrates explanation-based learning (EBL) with techniques from inductive logic programming. Specifically, EBL is used to constrain an inductive search for control heuristics that help a planner choose between competing plan refinements. Since SCOPE uses a very flexible training approach, its learning algorithm can be easily focused to prefer search paths that are better for particular evaluation metrics. SCOPE is extensively tested on several planning domains, including a logistics transportation domain and a production manufacturing domain. In these tests, it is shown to significantly improve both planning efficiency and quality and is shown to be more robust than a competing approach.
ML ID: 81
Relational Learning of Pattern-Match Rules for Information Extraction
[Details] [PDF]
Mary Elaine Califf and Raymond J. Mooney
In Proceedings of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, 6-11, Standford, CA, March 1998.
Information extraction is a form of shallow text processing which locates a specified set of relevant items in natural language documents. Such systems can be useful, but require domain-specific knowledge and rules, and are time-consuming and difficult to build by hand, making infomation extraction a good testbed for the application of machine learning techniques to natural language processing. This paper presents a system, RAPIER, that takes pairs of documents and filled templates and induces pattern-match rules that directly extract fillers for the slots in the template. The learning algorithm incorporates techniques from several inductive logic programming systems and learns unbounded patterns that include constraints on the words and part-of-speech tags surrounding the filler. Encouraging results are presented on learning to extract information from computer job postings from the newsgroup misc.jobs.offered.
ML ID: 80