Department of Computer Science

Machine Learning Research Group

University of Texas at Austin Artificial Intelligence Lab

Publications: All Publications

  1. Weakly-Supervised Grammar-Informed Bayesian CCG Parser Learning
    [Details] [PDF] [Slides]
    Dan Garrette, Chris Dyer, Jason Baldridge, Noah A. Smith
    In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15), Austin, TX, January 2015.
    Combinatory Categorial Grammar (CCG) is a lexicalized grammar formalism in which words are associated with categories that, in combination with a small universal set of rules, specify the syntactic configurations in which they may occur. Previous work has shown that learning sequence models for CCG tagging can be improved by using priors that are sensitive to the formal properties of CCG as well as cross-linguistic universals. We extend this approach to the task of learning a full CCG parser from weak supervision. We present a Bayesian formulation for CCG parser induction that assumes only supervision in the form of an incomplete tag dictionary mapping some word types to sets of potential categories. Our approach outperforms a baseline model trained with uniform priors by exploiting universal, intrinsic properties of the CCG formalism to bias the model toward simpler, more cross-linguistically common categories.
    ML ID: 310
  2. Natural Language Semantics using Probabilistic Logic
    [Details] [PDF] [Slides]
    Islam Beltagy
    October 2014. PhD proposal, Department of Computer Science, The University of Texas at Austin.
    With better natural language semantic representations, computers can do more applications more efficiently as a result of better understanding of natural text. However, no single semantic representation at this time fulfills all requirements needed for a satisfactory representation. Logic-based representations like first-order logic capture many of the linguistic phenomena using logical constructs, and they come with standardized inference mechanisms, but standard first-order logic fails to capture the ``graded'' aspect of meaning in languages. Distributional models use contextual similarity to predict the ``graded'' semantic similarity of words and phrases but they do not adequately capture logical structure. In addition, there are a few recent attempts to combine both representations either on the logic side (still, not a graded representation), or in the distribution side(not full logic).

    We propose using probabilistic logic to represent natural language semantics combining the expressivity and the automated inference of logic, and the gradedness of distributional representations. We evaluate this semantic representation on two tasks, Recognizing Textual Entailment (RTE) and Semantic Textual Similarity (STS). Doing RTE and STS better is an indication of a better semantic understanding.

    Our system has three main components, 1. Parsing and Task Representation, 2. Knowledge Base Construction, and 3. Inference The input natural sentences of the RTE/STS task are mapped to logical form using Boxer which is a rule based system built on top of a CCG parser, then they are used to formulate the RTE/STS problem in probabilistic logic. Then, a knowledge base is represented as weighted inference rules collected from different sources like WordNet and on-the-fly lexical rules from distributional semantics. An advantage of using probabilistic logic is that more rules can be added from more resources easily by mapping them to logical rules and weighting them appropriately. The last component is the inference, where we solve the probabilistic logic inference problem using an appropriate probabilistic logic tool like Markov Logic Network (MLN), or Probabilistic Soft Logic (PSL). We show how to solve the inference problems in MLNs efficiently for RTE using a modified closed-world assumption and a new inference algorithm, and how to adapt MLNs and PSL for STS by relaxing conjunctions. Experiments show that our semantic representation can handle RTE and STS reasonably well.

    For the future work, our short-term goals are 1. better RTE task representation and finite domain handling, 2. adding more inference rules, precompiled and on-the-fly, 3. generalizing the modified closed-world assumption, 4. enhancing our inference algorithm for MLNs, and 5. adding a weight learning step to better adapt the weights. On the longer-term, we would like to apply our semantic representation to the question answering task, support generalized quantifiers, contextualize WordNet rules we use, apply our semantic representation to languages other than English, and implement a probabilistic logic Inference Inspector that can visualize the proof structure.

    ML ID: 308
  3. Weakly-Supervised Bayesian Learning of a CCG Supertagger
    [Details] [PDF] [Slides] [Poster]
    Dan Garrette and Chris Dyer and Jason Baldridge and Noah A. Smith
    In Proceedings of the Eighteenth Conference on Computational Natural Language Learning (CoNLL-2014), 141--150, Baltimore, MD, June 2014.
    We present a Bayesian formulation for weakly-supervised learning of a Combinatory Categorial Grammar (CCG) supertagger with an HMM. We assume supervision in the form of a tag dictionary, and our prior encourages the use of cross-linguistically common category structures as well as transitions between tags that can combine locally according to CCG's combinators. Our prior is theoretically appealing since it is motivated by language-independent, universal properties of the CCG formalism. Empirically, we show that it yields substantial improvements over previous work that used similar biases to initialize an EM-based learner. Additional gains are obtained by further shaping the prior with corpus-specific information that is extracted automatically from raw text and a tag dictionary.
    ML ID: 307
  4. Inclusive yet Selective: Supervised Distributional Hypernymy Detection
    [Details] [PDF]
    Stephen Roller and Katrin Erk and Gemma Boleda
    In Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014), 1025--1036, Dublin, Ireland, August 2014.
    We test the Distributional Inclusion Hypothesis, which states that hypernyms tend to occur in a superset of contexts in which their hyponyms are found. We find that this hypothesis only holds when it is applied to relevant dimensions. We propose a robust supervised approach that achieves accuracies of .84 and .85 on two existing datasets and that can be interpreted as selecting the dimensions that are relevant for distributional inclusion.
    ML ID: 306
  5. UTexas: Natural Language Semantics using Distributional Semantics and Probabilistic Logic
    [Details] [PDF]
    Islam Beltagy and Stephen Roller and Gemma Boleda and and Katrin Erk and Raymond J. Mooney
    In The 8th Workshop on Semantic Evaluation (SemEval-2014), 796--801, Dublin, Ireland, August 2014.
    We represent natural language semantics by combining logical and distributional information in probabilistic logic. We use Markov Logic Networks (MLN) for the RTE task, and Probabilistic Soft Logic (PSL) for the STS task. The system is evaluated on the SICK dataset. Our best system achieves 73% accuracy on the RTE task, and a Pearson's correlation of 0.71 on the STS task.
    ML ID: 305
  6. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild
    [Details] [PDF] [Poster]
    Jesse Thomason and Subhashini Venugopalan and Sergio Guadarrama and Kate Saenko and Raymond Mooney
    In Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014), 1218--1227, Dublin, Ireland, August 2014.
    This paper integrates techniques in natural language processing and computer vision to improve recognition and description of entities and activities in real-world videos. We propose a strategy for generating textual descriptions of videos by using a factor graph to combine visual detections with language statistics. We use state-of-the-art visual recognition systems to obtain confidences on entities, activities, and scenes present in the video. Our factor graph model combines these detection confidences with probabilistic knowledge mined from text corpora to estimate the most likely subject, verb, object, and place. Results on YouTube videos show that our approach improves both the joint detection of these latent, diverse sentence components and the detection of some individual components when compared to using the vision system alone, as well as over a previous n-gram language-modeling approach. The joint detection allows us to automatically generate more accurate, richer sentential descriptions of videos with a wide array of possible content.
    ML ID: 304
  7. Efficient Markov Logic Inference for Natural Language Semantics
    [Details] [PDF] [Poster]
    Islam Beltagy and Raymond J. Mooney
    In Proceedings of the Fourth International Workshop on Statistical Relational AI at AAAI (StarAI-2014), 9--14, Quebec City, Canada, July 2014.
    Using Markov logic to integrate logical and distributional information in natural-language semantics results in complex inference problems involving long, complicated formulae. Current inference methods for Markov logic are ineffective on such problems. To address this problem, we propose a new inference algorithm based on SampleSearch that computes probabilities of complete formulae rather than ground atoms. We also introduce a modified closed-world assumption that significantly reduces the size of the ground network, thereby making inference feasible. Our approach is evaluated on the recognizing textual entailment task, and experiments demonstrate its dramatic impact on the efficiency of inference.
    ML ID: 303
  8. Integrating Visual and Linguistic Information to Describe Properties of Objects
    [Details] [PDF]
    Calvin MacKenzie
    2014. Undergraduate Honors Thesis, Computer Science Department, University of Texas at Austin.
    Generating sentences from images has historically been performed with standalone Computer Vision systems. The idea of combining visual and linguistic information has been gaining traction in the Computer Vision and Natural Language Processing communities over the past several years. The motivation for a combined system is to generate richer linguistic descriptions of images. Standalone vision systems are typically unable to generate linguistically rich descriptions. This approach combines abundant available language data to clean up noisy results from standalone vision systems.

    This thesis investigates the performance of several models which integrate information from language and vision systems in order to describe certain attributes of objects. The attributes used were split into two categories: color attributes and other attributes. Our proposed model was found to be statistically significantly more accurate than the vision system alone for both sets of attributes.

    ML ID: 302
  9. Semantic Parsing using Distributional Semantics and Probabilistic Logic
    [Details] [PDF] [Poster]
    Islam Beltagy and Katrin Erk and Raymond Mooney
    In Proceedings of ACL 2014 Workshop on Semantic Parsing (SP-2014), 7--11, Baltimore, MD, June 2014.
    We propose a new approach to semantic parsing that is not constrained by a fixed formal ontology and purely logical inference. Instead, we use distributional semantics to generate only the relevant part of an on-the-fly ontology. Sentences and the on-the-fly ontology are represented in probabilistic logic. For inference, we use probabilistic logic frameworks like Markov Logic Networks (MLN) and Probabilistic Soft Logic (PSL). This semantic parsing approach is evaluated on two tasks, Textual Entitlement (RTE) and Textual Similarity (STS), both accomplished using inference in probabilistic logic. Experiments show the potential of the approach.
    ML ID: 301
  10. Probabilistic Soft Logic for Semantic Textual Similarity
    [Details] [PDF] [Poster]
    Islam Beltagy and Katrin Erk and Raymond J. Mooney
    In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL-14), 1210--1219, Baltimore, MD, 2014.
    Probabilistic Soft Logic (PSL) is a recently developed framework for probabilistic logic. We use PSL to combine logical and distributional representations of natural-language meaning, where distributional information is represented in the form of weighted inference rules. We apply this framework to the task of Semantic Textual Similarity (STS) (i.e. judging the semantic similarity of natural-language sentences), and show that PSL gives improved results compared to a previous approach based on Markov Logic Networks (MLNs) and a purely distributional approach.
    ML ID: 300
  11. Plan Recognition Using Statistical Relational Models
    [Details] [PDF]
    Sindhu Raghavan and Parag Singla and Raymond J. Mooney
    In Sukthankar, G. and Geib, C. and Bui, H.H. and Pynadath, D. and Goldman, R.P., editors, Plan, Activity, and Intent Recognition: Theory and Practice, 57--85, Burlington, MA, 2014. Morgan Kaufmann.
    Plan recognition is the task of predicting an agent's top-level plans based on its observed actions. It is an abductive reasoning task that involves inferring plans that best explain observed actions. Most existing approaches to plan recognition and other abductive reasoning tasks either use first-order logic (or subsets of it) or probabilistic graphical models. While the former cannot handle uncertainty in the data, the latter cannot handle structured representations. To overcome these limitations, we explore the application of statistical relational models that combine the strengths of both first-order logic and probabilistic graphical models to plan recognition. Specifically, we introduce two new approaches to abductive plan recognition using Bayesian Logic Programs (BLPs) and Markov Logic Networks (MLNs). Neither of these formalisms is suited for abductive reasoning because of the deductive nature of the underlying logical inference. In this work, we propose approaches to adapt both these formalisms for abductive plan recognition. We present an extensive evaluation of our approaches on three benchmark datasets on plan recognition, comparing them with existing state-of-the-art methods.
    ML ID: 298
  12. Active Multitask Learning Using Both Latent and Supervised Shared Topics
    [Details] [PDF] [Slides]
    Ayan Acharya and Raymond J. Mooney and Joydeep Ghosh
    In Proceedings of the 2014 SIAM International Conference on Data Mining (SDM14), Philadelphia, Pennsylvania, April 2014.
    Multitask learning (MTL) via a shared representation has been adopted to alleviate problems with sparsity of labeled data across different learning tasks. Active learning, on the other hand, reduces the cost of labeling examples by making informative queries over an unlabeled pool of data. Therefore, a unification of both of these approaches can potentially be useful in settings where labeled information is expensive to obtain but the learning tasks or domains have some common characteristics. This paper introduces two such models -- Active Doubly Supervised Latent Dirichlet Allocation (Act-DSLDA) and its non-parametric variation (Act-NPDSLDA) that integrate MTL and active learning in the same framework. These models make use of both latent and supervised shared topics to accomplish multitask learning. Experimental results on both document and image classification show that integrating MTL and active learning along with shared latent and supervised topics is superior to other methods which do not employ all of these components.
    ML ID: 297
  13. Statistical Script Learning with Multi-Argument Events
    [Details] [PDF] [Poster]
    Karl Pichotta and Raymond J. Mooney
    In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014), 220--229, Gothenburg, Sweden, April 2014.
    Scripts represent knowledge of stereotypical event sequences that can aid text understanding. Initial statistical methods have been developed to learn probabilistic scripts from raw text corpora; however, they utilize a very impoverished representation of events, consisting of a verb and one dependent argument. We present a script learning approach that employs events with multiple arguments. Unlike previous work, we model the interactions between multiple entities in a script. Experiments on a large corpus using the task of inferring held-out events (the "narrative cloze evaluation") demonstrate that modeling multi-argument events improves predictive accuracy.
    ML ID: 296
  14. University of Texas at Austin KBP 2013 Slot Filling System: Bayesian Logic Programs for Textual Inference
    [Details] [PDF]
    Yinon Bentor and Amelia Harrison and Shruti Bhosale and Raymond Mooney
    In Proceedings of the Sixth Text Analysis Conference (TAC 2013), 2013.
    This document describes the University of Texas at Austin 2013 system for the Knowledge Base Population (KBP) English Slot Filling (SF) task. The UT Austin system builds upon the output of an existing relation extractor by augmenting relations that are explicitly stated in the text with ones that are inferred from the stated relations using probabilistic rules that encode commonsense world knowledge. Such rules are learned from linked open data and are encoded in the form of Bayesian Logic Programs (BLPs), a statistical relational learning framework based on directed graphical models. In this document, we describe our methods for learning these rules, estimating their associated weights, and performing probabilistic and logical inference to infer unseen relations. In the KBP SF task, our system was able to infer several unextracted relations, but its performance was limited by the base level extractor.
    ML ID: 299
  15. YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-shot Recognition
    [Details] [PDF] [Poster]
    Sergio Guadarrama, Niveda Krishnamoorthy, Girish Malkarnenkar, Subhashini Venugopalan, Raymond Mooney, Trevor Darrell, Kate Saenko
    In Proceedings of the 14th International Conference on Computer Vision (ICCV-2013), 2712--2719, Sydney, Australia, December 2013.
    Despite a recent push towards large-scale object recognition, activity recognition remains limited to narrow domains and small vocabularies of actions. In this paper, we tackle the challenge of recognizing and describing activities "in-the-wild". We present a solution that takes a short video clip and outputs a brief sentence that sums up the main activity in the video, such as the actor, the action, and its object. Unlike previous work, our approach works on out-of-domain actions: it does not require training videos of the exact activity. If it cannot find an accurate prediction for a pre-trained model, it finds a less specific answer that is also plausible from a pragmatic standpoint. We use semantic hierarchies learned from the data to help to choose an appropriate level of generalization, and priors learned from web-scale natural language corpora to penalize unlikely combinations of actors/actions/objects; we also use a web-scale language model to "fill in" novel verbs, i.e. when the verb does not appear in the training set. We evaluate our method on a large YouTube corpus and demonstrate it is able to generate short sentence descriptions of video clips better than baseline approaches.
    ML ID: 295
  16. A Multimodal LDA Model Integrating Textual, Cognitive and Visual Modalities
    [Details] [PDF]
    Stephen Roller and Sabine Schulte im Walde
    In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), 1146--1157, Seattle, WA, October 2013.
    Recent investigations into grounded models of language have shown that holistic views of language and perception can provide higher performance than independent views. In this work, we improve a two-dimensional multimodal version of Latent Dirichlet Allocation (Andrews et al., 2009) in various ways. (1) We outperform text-only models in two different evaluations, and demonstrate that low-level visual features are directly compatible with the existing model. (2) We present a novel way to integrate visual features into the LDA model using unsupervised clusters of images. The clusters are directly interpretable and improve on our evaluation tasks. (3) We provide two novel ways to extend the bimodal models to support three or more modalities. We find that the three-, four-, and five-dimensional models significantly outperform models using only one or two modalities, and that nontextual modalities each provide separate, disjoint knowledge that cannot be forced into a shared, latent structure.
    ML ID: 294
  17. Identifying Phrasal Verbs Using Many Bilingual Corpora
    [Details] [PDF] [Poster]
    Karl Pichotta and John DeNero
    In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), 636--646, Seattle, WA, October 2013.
    We address the problem of identifying multiword expressions in a language, focusing on English phrasal verbs. Our polyglot ranking approach integrates frequency statistics from translated corpora in 50 different languages. Our experimental evaluation demonstrates that combining statistical evidence from many parallel corpora using a novel ranking-oriented boosting algorithm produces a comprehensive set of English phrasal verbs, achieving performance comparable to a human-curated set.
    ML ID: 293
  18. Detecting Promotional Content in Wikipedia
    [Details] [PDF] [Slides]
    Shruti Bhosale and Heath Vinicombe and Raymond J. Mooney
    In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), 1851--1857, Seattle, WA, October 2013.
    This paper presents an approach for detecting promotional content in Wikipedia. By incorporating stylometric features, including features based on n-gram and PCFG language models, we demonstrate improved accuracy at identifying promotional articles, compared to using only lexical information and meta-features.
    ML ID: 292
  19. Grounded Language Learning Models for Ambiguous Supervision
    [Details] [PDF] [Slides]
    Joo Hyun Kim
    PhD Thesis, Department of Computer Science, University of Texas at Austin, December 2013.
    Communicating with natural language interfaces is a long-standing, ultimate goal for artificial intelligence (AI) agents to pursue, eventually. One core issue toward this goal is "grounded" language learning, a process of learning the semantics of natural language with respect to relevant perceptual inputs. In order to ground the meanings of language in a real world situation, computational systems are trained with data in the form of natural language sentences paired with relevant but ambiguous perceptual contexts. With such ambiguous supervision, it is required to resolve the ambiguity between a natural language (NL) sentence and a corresponding set of possible logical meaning representations (MR).

    In this thesis, we focus on devising effective models for simultaneously disambiguating such supervision and learning the underlying semantics of language to map NL sentences into proper logical MRs. We present probabilistic generative models for learning such correspondences along with a reranking model to improve the performance further.

    First, we present a probabilistic generative model that learns the mappings from NL sentences into logical forms where the true meaning of each NL sentence is one of a handful of candidate logical MRs. It simultaneously disambiguates the meaning of each sentence in the training data and learns to probabilistically map an NL sentence to its corresponding MR form depicted in a single tree structure. We perform evaluations on the RoboCup sportscasting corpus, proving that our model is more effective than those proposed by previous researchers.

    Next, we describe two PCFG induction models for grounded language learning that extend the previous grounded language learning model of Borschinger, Jones, and Johnson (2011). Borschinger et al.'s approach works well in situations of limited ambiguity, such as in the sportscasting task. However, it does not scale well to highly ambiguous situations when there are large sets of potential meaning possibilities for each sentence, such as in the navigation instruction following task first studied by Chen and Mooney (2011). The two models we present overcome such limitations by employing a learned semantic lexicon as a basic correspondence unit between NL and MR for PCFG rule generation.

    Finally, we present a method of adapting discriminative reranking to grounded language learning in order to improve the performance of our proposed generative models. Although such generative models are easy to implement and are intuitive, it is not always the case that generative models perform best, since they are maximizing the joint probability of data and model, rather than directly maximizing conditional probability. Because we do not have gold-standard references for training a secondary conditional reranker, we incorporate weak supervision of evaluations against the perceptual world during the process of improving model performance.

    All these approaches are evaluated on the two publicly available domains that have been actively used in many other grounded language learning studies. Our methods demonstrate consistently improved performance over those of previous studies in the domains with different languages; this proves that our methods are language-independent and can be generally applied to other grounded learning problems as well. Further possible applications of the presented approaches include summarized machine translation tasks and learning from real perception data assisted by computer vision and robotics.

    ML ID: 291
  20. Generating Natural-Language Video Descriptions Using Text-Mined Knowledge
    [Details] [PDF] [Slides]
    Niveda Krishnamoorthy, Girish Malkarnenkar, Raymond J. Mooney, Kate Saenko, Sergio Guadarrama
    In Proceedings of the NAACL HLT Workshop on Vision and Language (WVL '13), 10--19, Atlanta, Georgia, July 2013.
    We present a holistic data-driven technique that generates natural-language descriptions for videos. We combine the output of state-of-the-art object and activity detectors with ``real-world'' knowledge to select the most probable subject-verb-object triplet for describing a video. We show that this knowledge, automatically mined from web-scale text corpora, enhances the triplet selection algorithm by providing it contextual information and leads to a four-fold increase in activity identification. Unlike previous methods, our approach can annotate arbitrary videos without requiring the expensive collection and annotation of a similar training video corpus. We evaluate our technique against a baseline that does not use text-mined knowledge and show that humans prefer our descriptions 61% of the time.
    ML ID: 290
  21. Using Both Latent and Supervised Shared Topics for Multitask Learning
    [Details] [PDF] [Slides]
    Ayan Acharya, Aditya Rawal, Raymond J. Mooney, Eduardo R. Hruschka
    In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 369--384, Prague, Czech Republic, September 2013.
    This paper introduces two new frameworks, Doubly Supervised Latent Dirichlet Allocation (DSLDA) and its non-parametric variation (NP-DSLDA), that integrate two different types of supervision: topic labels and category labels. This approach is particularly useful for multitask learning, in which both latent and supervised topics are shared between multiple categories. Experimental results on both document and image classification show that both types of supervision improve the performance of both DSLDA and NP-DSLDA and that sharing both latent and supervised topics allows for better multitask learning.
    ML ID: 289
  22. Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages
    [Details] [PDF]
    Dan Garrette and Jason Mielens and Jason Baldridge
    To Appear In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL-2013), 583--592, Sofia, Bulgaria, August 2013.
    Developing natural language processing tools for low-resource languages often requires creating resources from scratch. While a variety of semi-supervised methods exist for training from incomplete data, there are open questions regarding what types of training data should be used and how much is necessary. We discuss a series of experiments designed to shed light on such questions in the context of part-of-speech tagging. We obtain timed annotations from linguists for the low-resource languages Kinyarwanda and Malagasy (as well as English) and evaluate how the amounts of various kinds of data affect performance of a trained POS-tagger. Our results show that annotation of word types is the most important, provided a sufficiently capable semi-supervised learning infrastructure is in place to project type information onto a raw corpus. We also show that finite-state morphological analyzers are effective sources of type information when few labeled examples are available.
    ML ID: 288
  23. Online Inference-Rule Learning from Natural-Language Extractions
    [Details] [PDF] [Poster]
    Sindhu Raghavan and Raymond J. Mooney
    In Proceedings of the 3rd Statistical Relational AI (StaRAI-13) workshop at AAAI '13, July 2013.
    In this paper, we consider the problem of learning commonsense knowledge in the form of first-order rules from incomplete and noisy natural-language extractions produced by an off-the-shelf information extraction (IE) system. Much of the information conveyed in text must be inferred from what is explicitly stated since easily inferable facts are rarely mentioned. The proposed rule learner accounts for this phenomenon by learning rules in which the body of the rule contains relations that are usually explicitly stated, while the head employs a less-frequently mentioned relation that is easily inferred. The rule learner processes training examples in an online manner to allow it to scale to large text corpora. Furthermore, we propose a novel approach to weighting rules using a curated lexical ontology like WordNet. The learned rules along with their parameters are then used to infer implicit information using a Bayesian Logic Program. Experimental evaluation on a machine reading testbed demonstrates the efficacy of the proposed methods.
    ML ID: 287
  24. Adapting Discriminative Reranking to Grounded Language Learning
    [Details] [PDF] [Slides]
    Joohyun Kim and Raymond J. Mooney
    In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL-2013), 218--227, Sofia, Bulgaria, August 2013.
    We adapt discriminative reranking to improve the performance of grounded language acquisition, specifically the task of learning to follow navigation instructions from observation. Unlike conventional reranking used in syntactic and semantic parsing, gold-standard reference trees are not naturally available in a grounded setting. Instead, we show how the weak supervision of response feedback (e.g. successful task completion) can be used as an alternative, experimentally demonstrating that its performance is comparable to training on gold-standard parse trees.
    ML ID: 286
  25. Montague Meets Markov: Deep Semantics with Probabilistic Logical Form
    [Details] [PDF] [Slides]
    Islam Beltagy, Cuong Chau, Gemma Boleda, Dan Garrette, Katrin Erk, Raymond Mooney
    In Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*Sem-2013), 11--21, Atlanta, GA, June 2013.
    We combine logical and distributional representations of natural language meaning by transforming distributional similarity judgments into weighted inference rules using Markov Logic Networks (MLNs). We show that this framework supports both judging sentence similarity and recognizing textual entailment by appropriately adapting the MLN implementation of logical connectives. We also show that distributional phrase similarity, used as textual inference rules created on the fly, improves its performance.
    ML ID: 285
  26. A Formal Approach to Linking Logical Form and Vector-Space Lexical Semantics
    [Details] [PDF]
    Dan Garrette, Katrin Erk, Raymond J. Mooney
    In Harry Bunt, Johan Bos, and Stephen Pulman, editors, Computing Meaning, 27--48, Berlin, 2013. Springer.
    First-order logic provides a powerful and flexible mechanism for representing natural language semantics. However, it is an open question of how best to integrate it with uncertain, weighted knowledge, for example regarding word meaning. This paper describes a mapping between predicates of logical form and points in a vector space. This mapping is then used to project distributional inferences to inference rules in logical form. We then describe first steps of an approach that uses this mapping to recast first-order semantics into the probabilistic models that are part of Statistical Relational AI. Specifically, we show how Discourse Representation Structures can be combined with distributional models for word meaning inside a Markov Logic Network and used to successfully perform inferences that take advantage of logical concepts such as negation and factivity as well as weighted information on word meaning in context.
    ML ID: 284
  27. Learning a Part-of-Speech Tagger from Two Hours of Annotation
    [Details] [PDF] [Slides] [Video]
    Dan Garrette, Jason Baldridge
    In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT-13), 138--147, Atlanta, GA, June 2013.
    Most work on weakly-supervised learning for part-of-speech taggers has been based on unrealistic assumptions about the amount and quality of training data. For this paper, we attempt to create true low-resource scenarios by allowing a linguist just two hours to annotate data and evaluating on the languages Kinyarwanda and Malagasy. Given these severely limited amounts of either type supervision (tag dictionaries) or token supervision (labeled sentences), we are able to dramatically improve the learning of a hidden Markov model through our method of automatically generalizing the annotations, reducing noise, and inducing word-tag frequency information.
    ML ID: 283
  28. Generating Natural-Language Video Descriptions Using Text-Mined Knowledge
    [Details] [PDF] [Slides]
    Niveda Krishnamoorthy, Girish Malkarnenkar, Raymond J. Mooney, Kate Saenko, Sergio Guadarrama
    In Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI-2013), 541--547, July 2013.
    We present a holistic data-driven technique that generates natural-language descriptions for videos. We combine the output of state-of-the-art object and activity detectors with "real-world" knowledge to select the most probable subject-verb-object triplet for describing a video. We show that this knowledge, automatically mined from web-scale text corpora, enhances the triplet selection algorithm by providing it contextual information and leads to a four-fold increase in activity identification. Unlike previous methods, our approach can annotate arbitrary videos without requiring the expensive collection and annotation of a similar training video corpus. We evaluate our technique against a baseline that does not use text-mined knowledge and show that humans prefer our descriptions 61 percent of the time.
    ML ID: 282
  29. Latent Variable Models of Distributional Lexical Semantics
    [Details] [PDF]
    Joseph Reisinger
    PhD Thesis, Department of Computer Science, University of Texas at Austin, May 2012.
    In order to respond to increasing demand for natural language interfaces—and provide meaningful insight into user query intent—fast, scalable lexical semantic models with flexible representations are needed. Human concept organization is a rich phenomenon that has yet to be accounted for by a single coherent psychological framework: Concept generalization is captured by a mixture of prototype and exemplar models, and local taxonomic information is available through multiple overlapping organizational systems. Previous work in computational linguistics on extracting lexical semantic information from unannotated corpora does not provide adequate representational flexibility and hence fails to capture the full extent of human conceptual knowledge. In this thesis I outline a family of probabilistic models capable of capturing important aspects of the rich organizational structure found in human language that can predict contextual variation, selectional preference and feature-saliency norms to a much higher degree of accuracy than previous approaches. These models account for cross-cutting structure of concept organization—i.e. selective attention, or the notion that humans make use of different categorization systems for different kinds of generalization tasks—and can be applied to Web-scale corpora. Using these models, natural language systems will be able to infer a more comprehensive semantic relations, which in turn may yield improved systems for question answering, text classification, machine translation, and information retrieval.
    ML ID: 309
  30. Review Quality Aware Collaborative Filtering
    [Details] [PDF]
    Sindhu Raghavan and Suriya Ganasekar and Joydeep Ghosh
    In Sixth ACM Conference on Recommender Systems (RecSys 2012), 123--130, September 2012.
    Probabilistic matrix factorization (PMF) and other popular approaches to collaborative filtering assume that the ratings given by users for products are genuine, and hence they give equal importance to all available ratings. However, this is not always true due to several reasons including the presence of opinion spam in product reviews. In this paper, the possibility of performing collaborative filtering while attaching weights or quality scores to the ratings is explored. The quality scores, which are determined from the corresponding review data are used to ``up--weight'' or ``down--weight'' the importance given to the individual rating while performing collaborative filtering, thereby improving the accuracy of the predictions. First, the measure used to capture the quality of the ratings is described. Different approaches for estimating the quality score based on the available review information are examined. Subsequently, a mathematical formulation to incorporate quality scores as weights for the ratings in the basic PMF framework is derived. Experimental evaluation on two product categories of a benchmark data set from Amazon.com demonstrates the efficacy of our approach.
    ML ID: 281
  31. Bayesian Logic Programs for Plan Recognition and Machine Reading
    [Details] [PDF] [Slides]
    Sindhu Raghavan
    PhD Thesis, Department of Computer Science, University of Texas at Austin, December 2012. 170.
    Several real world tasks involve data that is uncertain and relational in nature. Traditional approaches like first-order logic and probabilistic models either deal with structured data or uncertainty, but not both. To address these limitations, statistical relational learning (SRL), a new area in machine learning integrating both first-order logic and probabilistic graphical models, has emerged in the recent past. The advantage of SRL models is that they can handle both uncertainty and structured/relational data. As a result, they are widely used in domains like social network analysis, biological data analysis, and natural language processing. Bayesian Logic Programs (BLPs), which integrate both first-order logic and Bayesian networks are a powerful SRL formalism developed in the recent past. In this dissertation, we develop approaches using BLPs to solve two real world tasks -- plan recognition and machine reading.

    Plan recognition is the task of predicting an agent's top-level plans based on its observed actions. It is an abductive reasoning task that involves inferring cause from effect. In the first part of the dissertation, we develop an approach to abductive plan recognition using BLPs. Since BLPs employ logical deduction to construct the networks, they cannot be used effectively for abductive plan recognition as is. Therefore, we extend BLPs to use logical abduction to construct Bayesian networks and call the resulting model Bayesian Abductive Logic Programs (BALPs).

    In the second part of the dissertation, we apply BLPs to the task of machine reading, which involves automatic extraction of knowledge from natural language text. Most information extraction (IE) systems identify facts that are explicitly stated in text. However, much of the information conveyed in text must be inferred from what is explicitly stated since easily inferable facts are rarely mentioned. Human readers naturally use common sense knowledge and "read between the lines" to infer such implicit information from the explicitly stated facts. Since IE systems do not have access to common sense knowledge, they cannot perform deeper reasoning to infer implicitly stated facts. Here, we first develop an approach using BLPs to infer implicitly stated facts from natural language text. It involves learning uncertain common sense knowledge in the form of probabilistic first-order rules by mining a large corpus of automatically extracted facts using an existing rule learner. These rules are then used to derive additional facts from extracted information using BLP inference. We then develop an online rule learner that handles the concise, incomplete nature of natural-language text and learns first-order rules from noisy IE extractions. Finally, we develop a novel approach to calculate the weights of the rules using a curated lexical ontology like WordNet.

    Both tasks described above involve inference and learning from partially observed or incomplete data. In plan recognition, the underlying cause or the top-level plan that resulted in the observed actions is not known or observed. Further, only a subset of the executed actions can be observed by the plan recognition system resulting in partially observed data. Similarly, in machine reading, since some information is implicitly stated, they are rarely observed in the data. In this dissertation, we demonstrate the efficacy of BLPs for inference and learning from incomplete data. Experimental comparison on various benchmark data sets on both tasks demonstrate the superior performance of BLPs over state-of-the-art methods.

    ML ID: 280
  32. Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries
    [Details] [PDF]
    Dan Garrette and Jason Baldridge
    In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012), 821--831, Jeju, Korea, July 2012.
    Past work on learning part-of-speech taggers from tag dictionaries and raw data has reported good results, but the assumptions made about those dictionaries are often unrealistic: due to historical precedents, they assume access to information about labels in the raw and test sets. Here, we demonstrate ways to learn hidden Markov model taggers from incomplete tag dictionaries. Taking the MIN-GREEDY algorithm (Ravi et al., 2010) as a starting point, we improve it with several intuitive heuristics. We also define a simple HMM emission initialization that takes advantage of the tag dictionary and raw data to capture both the openness of a given tag and its estimated prevalence in the raw data. Altogether, our augmentations produce improvements to performance over the original MIN-GREEDY algorithm for both English and Italian data.
    ML ID: 279
  33. Improving Video Activity Recognition using Object Recognition and Text Mining
    [Details] [PDF] [Slides]
    Tanvi S. Motwani and Raymond J. Mooney
    In Proceedings of the 20th European Conference on Artificial Intelligence (ECAI-2012), 600--605, August 2012.
    Recognizing activities in real-world videos is a challenging AI problem. We present a novel combination of standard activity classification, object recognition, and text mining to learn effective activity recognizers without ever explicitly labeling training videos. We cluster verbs used to describe videos to automatically discover classes of activities and produce a labeled training set. This labeled data is then used to train an activity classifier based on spatio-temporal features. Next, text mining is employed to learn the correlations between these verbs and related objects. This knowledge is then used together with the outputs of an off-the-shelf object recognizer and the trained activity classifier to produce an improved activity recognizer. Experiments on a corpus of YouTube videos demonstrate the effectiveness of the overall approach.
    ML ID: 274
  34. Generative Models of Grounded Language Learning with Ambiguous Supervision
    [Details] [PDF] [Slides]
    Joohyun Kim
    Technical Report, PhD proposal, Department of Computer Science, The University of Texas at Austin, June 2012.

    "Grounded" language learning is the process of learning the semantics of natural language with respect to relevant perceptual inputs. Toward this goal, computational systems are trained with data in the form of natural language sentences paired with relevant but ambiguous perceptual contexts. With such ambiguous supervision, it is required to resolve the ambiguity between a natural language (NL) sentence and a corresponding set of possible logical meaning representations (MR). My research focuses on devising effective models for simultaneously disambiguating such supervision and learning the underlying semantics of language to map NL sentences into proper logical forms. Specifically, I will present two probabilistic generative models for learning such correspondences. The models are applied to two publicly available datasets in two different domains, sportscasting and navigation, and compared with previous work on the same data.

    I will first present a probabilistic generative model that learns the mappings from NL sentences into logical forms where the true meaning of each NL sentence is one of a handful of candidate logical MRs. It simultaneously disambiguates the meaning of each sentence in the training data and learns to probabilistically map a NL sentence to its MR form depicted in a single tree structure. Evaluations are performed on the RoboCup sportscasting corpous, which show that it outperforms previous methods.

    Next, I present a PCFG induction model for grounded language learning that extends the model of Borschinger, Jones, and Johnson (2011) by utilizing a semantic lexicon. Borschinger et al.'s approach works well when there is limited ambiguity such as in the sportscasting task, but it does not scale well to highly ambiguous situations when there are large sets of potential meaning possibilities for each sentence, such as in the navigation instruction following task studied by Chen and Mooney (2011). Our model overcomes such limitations by employing a semantic lexicon as the basic building block for PCFG rule generation. Our model also allows for novel combination of MR outputs when parsing novel test sentences.

    For future work, I propose to extend our PCFG induction model in several ways: improving the lexicon learning algorithm, discriminative re-ranking of top-k parses, and integrating the meaning representation language (MRL) grammar for extra structural information. The longer-term agenda includes applying our approach to summarized machine translation, using real perception data such as robot sensorimeter and images/videos, and joint learning with other natural language processing tasks.

    ML ID: 273
  35. Unsupervised PCFG Induction for Grounded Language Learning with Highly Ambiguous Supervision
    [Details] [PDF]
    Joohyun Kim and Raymond J. Mooney
    In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Natural Language Learning (EMNLP-CoNLL '12), 433--444, Jeju Island, Korea, July 2012.
    "Grounded" language learning employs training data in the form of sentences paired with relevant but ambiguous perceptual contexts. Borschinger et al. (2011) introduced an approach to grounded language learning based on unsupervised PCFG induction. Their approach works well when each sentence potentially refers to one of a small set of possible meanings, such as in the sportscasting task. However, it does not scale to problems with a large set of potential meanings for each sentence, such as the navigation instruction following task studied by Chen and Mooney (2011). This paper presents an enhancement of the PCFG approach that scales to such problems with highly-ambiguous supervision. Experimental results on the navigation task demonstrates the effectiveness of our approach.
    ML ID: 272
  36. Fast Online Lexicon Learning for Grounded Language Acquisition
    [Details] [PDF] [Slides]
    David L. Chen
    In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL-2012), 430--439, July 2012.
    Learning a semantic lexicon is often an important first step in building a system that learns to interpret the meaning of natural language. It is especially important in language grounding where the training data usually consist of language paired with an ambiguous perceptual context. Recent work by Chen and Mooney (2011) introduced a lexicon learning method that deals with ambiguous relational data by taking intersections of graphs. While the algorithm produced good lexicons for the task of learning to interpret navigation instructions, it only works in batch settings and does not scale well to large datasets. In this paper we introduce a new online algorithm that is an order of magnitude faster and surpasses the state-of-the-art results. We show that by changing the grammar of the formal meaning representation language and training on additional data collected from Amazon's Mechanical Turk we can further improve the results. We also include experimental results on a Chinese translation of the training data to demonstrate the generality of our approach.
    ML ID: 271
  37. Learning to "Read Between the Lines" using Bayesian Logic Programs
    [Details] [PDF] [Slides]
    Sindhu Raghavan and Raymond J. Mooney and Hyeonseo Ku
    In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL-2012), 349--358, July 2012.
    Most information extraction (IE) systems identify facts that are explicitly stated in text. However, in natural language, some facts are implicit, and identifying them requires "reading between the lines". Human readers naturally use common sense knowledge to infer such implicit information from the explicitly stated facts. We propose an approach that uses Bayesian Logic Programs (BLPs), a statistical relational model combining first-order logic and Bayesian networks, to infer additional implicit information from extracted facts. It involves learning uncertain commonsense knowledge (in the form of probabilistic first-order rules) from natural language text by mining a large corpus of automatically extracted facts. These rules are then used to derive additional facts from extracted information using BLP inference. Experimental evaluation on a benchmark data set for machine reading demonstrates the efficacy of our approach.
    ML ID: 270
  38. Learning Language from Ambiguous Perceptual Context
    [Details] [PDF] [Slides]
    David L. Chen
    PhD Thesis, Department of Computer Science, University of Texas at Austin, May 2012. 196.

    Building a computer system that can understand human languages has been one of the long-standing goals of artificial intelligence. Currently, most state-of-the-art natural language processing (NLP) systems use statistical machine learning methods to extract linguistic knowledge from large, annotated corpora. However, constructing such corpora can be expensive and time-consuming due to the expertise it requires to annotate such data. In this thesis, we explore alternative ways of learning which do not rely on direct human supervision. In particular, we draw our inspirations from the fact that humans are able to learn language through exposure to linguistic inputs in the context of a rich, relevant, perceptual environment.

    We first present a system that learned to sportscast for RoboCup simulation games by observing how humans commentate a game. Using the simple assumption that people generally talk about events that have just occurred, we pair each textual comment with a set of events that it could be referring to. By applying an EM-like algorithm, the system simultaneously learns a grounded language model and aligns each description to the corresponding event. The system does not use any prior language knowledge and was able to learn to sportscast in both English and Korean. Human evaluations of the generated commentaries indicate they are of reasonable quality and in some cases even on par with those produced by humans.

    For the sportscasting task, while each comment could be aligned to one of several events, the level of ambiguity was low enough that we could enumerate all the possible alignments. However, it is not always possible to restrict the set of possible alignments to such limited numbers. Thus, we present another system that allows each sentence to be aligned to one of exponentially many connected subgraphs without explicitly enumerating them. The system first learns a lexicon and uses it to prune the nodes in the graph that are unrelated to the words in the sentence. By only observing how humans follow navigation instructions, the system was able to infer the corresponding hidden navigation plans and parse previously unseen instructions in new environments for both English and Chinese data. With the rise in popularity of crowdsourcing, we also present results on collecting additional training data using Amazon’s Mechanical Turk. Since our system only needs supervision in the form of language being used in relevant contexts, it is easy for virtually anyone to contribute to the training data.

    ML ID: 269
  39. Constraint Propagation for Efficient Inference in Markov Logic
    [Details] [PDF] [Slides]
    Tivadar Papai, Parag Singla and Henry Kautz
    In Proceedings of 17th International Conference on Principles and Practice of Constraint Programming (CP 2011), Lecture Notes in Computer Science (LNCS), 691-705, September 2011.
    Many real world problems can be modeled using a combination of hard and soft constraints. Markov Logic is a highly expressive language which represents the underlying constraints by attaching real-valued weights to formulas in first order logic. The weight of a formula represents the strength of the corresponding constraint. Hard constraints are represented as formulas with infinite weight. The theory is compiled into a ground Markov network over which probabilistic inference can be done. For many problems, hard constraints pose a significant challenge to the probabilistic inference engine. However, solving the hard constraints (partially or fully) before hand outside of the probabilistic engine can hugely simplify the ground Markov network and speed probabilistic inference. In this work, we propose a generalized arc consistency algorithm that prunes the domains of predicates by propagating hard constraints. Our algorithm effectively performs unit propagation at a lifted level, avoiding the need to explicitly ground the hard constraints during the pre-processing phase, yielding a potentially exponential savings in space and time. Our approach results in much simplified domains, thereby, making the inference significantly more efficient both in terms of time and memory. Experimental evaluation over one artificial and two real-world datasets show the benefit of our approach.
    ML ID: 268
  40. Online Structure Learning for Markov Logic Networks
    [Details] [PDF] [Slides]
    Tuyen N. Huynh and Raymond J. Mooney
    In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2011), 81-96, September 2011.
    Most existing learning methods for Markov Logic Networks (MLNs) use batch training, which becomes computationally expensive and eventually infeasible for large datasets with thousands of training examples which may not even all fit in main memory. To address this issue, previous work has used online learning to train MLNs. However, they all assume that the model's structure (set of logical clauses) is given, and only learn the model's parameters. However, the input structure is usually incomplete, so it should also be updated. In this work, we present OSL-the first algorithm that performs both online structure and parameter learning for MLNs. Experimental results on two real- world datasets for natural-language field segmentation show that OSL outperforms systems that cannot revise structure.
    ML ID: 267
  41. Abductive Plan Recognition by Extending Bayesian Logic Programs
    [Details] [PDF] [Slides]
    Sindhu Raghavan, Raymond J. Mooney
    In Proceedings of the European Conference on Machine Learning/Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2011), 629-644, September 2011.
    Plan recognition is the task of predicting an agent's top-level plans based on its observed actions. It is an abductive reasoning task that involves inferring cause from effect. Most existing approaches to plan recognition use either first-order logic or probabilistic graphical models. While the former can- not handle uncertainty, the latter cannot handle structured representations. In or- der to overcome these limitations, we develop an approach to plan recognition using Bayesian Logic Programs (BLPs), which combine first-order logic and Bayesian networks. Since BLPs employ logical deduction to construct the net- works, they cannot be used effectively for plan recognition. Therefore, we extend BLPs to use logical abduction to construct Bayesian networks and call the result- ing model Bayesian Abductive Logic Programs (BALPs). We learn the parame- ters in BALPs using the Expectation Maximization algorithm adapted for BLPs. Finally, we present an experimental evaluation of BALPs on three benchmark data sets and compare its performance with the state-of-the-art for plan recognition.
    ML ID: 266
  42. Building a Persistent Workforce on Mechanical Turk for Multilingual Data Collection
    [Details] [PDF] [Slides]
    David L. Chen and William B. Dolan
    In Proceedings of The 3rd Human Computation Workshop (HCOMP 2011), August 2011.
    Traditional methods of collecting translation and paraphrase data are prohibitively expensive, making the construction of large, new corpora difficult. While crowdsourcing offers a cheap alternative, quality control and scalability can become problematic. We discuss a novel annotation task that uses videos as the stimulus which discourages cheating. In addi- tion, our approach requires only monolingual speakers, thus making it easier to scale since more workers are qualified to contribute. Finally, we employ a multi-tiered payment system that helps retain good workers over the long-term, resulting in a persistent, high-quality workforce. We present the results of one of the largest linguistic data collection efforts to date using Mechanical Turk, yielding 85K English sentences and more than 1k sentences for each of a dozen more languages.
    ML ID: 265
  43. Learning to Interpret Natural Language Navigation Instructions from Observations
    [Details] [PDF] [Slides]
    David L. Chen and Raymond J. Mooney
    In Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI-2011), 859-865, August 2011.
    The ability to understand natural-language instructions is critical to building intelligent agents that interact with humans. We present a system that learns to transform natural-language navigation instructions into executable formal plans. Given no prior linguistic knowledge, the system learns by simply observing how humans follow navigation instructions. The system is evaluated in three complex virtual indoor environments with numerous objects and landmarks. A previously collected realistic corpus of complex English navigation instructions for these environments is used for training and testing data. By using a learned lexicon to refine inferred plans and a supervised learner to induce a semantic parser, the system is able to automatically learn to correctly interpret a reasonable fraction of the complex instructions in this corpus.
    ML ID: 264
  44. Abductive Markov Logic for Plan Recognition
    [Details] [PDF] [Slides]
    Parag Singla and Raymond J. Mooney
    In Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI-2011), 1069-1075, August 2011.
    Plan recognition is a form of abductive reasoning that involves inferring plans that best explain sets of observed actions. Most existing approaches to plan recognition and other abductive tasks employ either purely logical methods that do not handle uncertainty, or purely probabilistic methods that do not handle structured representations. To overcome these limitations, this paper introduces an approach to abductive reasoning using a first-order probabilistic logic, specifically Markov Logic Networks (MLNs). It introduces several novel techniques for making MLNs efficient and effective for abduction. Experiments on three plan recognition datasets show the benefit of our approach over existing methods.
    ML ID: 263
  45. Cross-Cutting Models of Lexical Semantics
    [Details] [PDF] [Slides]
    Joseph Reisinger and Raymond Mooney
    In Proceedings of The Conference on Empirical Methods in Natural Language Processing (EMNLP 2011), 1405-1415, July 2011.
    Context-dependent word similarity can be measured over multiple cross-cutting dimensions. For example, lung and breath are similar thematically, while authoritative and superficial occur in similar syntactic contexts, but share little semantic similarity. Both of these notions of similarity play a role in determining word meaning, and hence lexical semantic models must take them both into account. Towards this end, we develop a novel model, Multi-View Mixture (MVM), that represents words as multiple overlapping clusterings. MVM finds multiple data partitions based on different subsets of features, subject to the marginal constraint that feature subsets are distributed according to Latent Dirichlet Allocation. Intuitively, this constraint favors feature partitions that have coherent topical semantics. Furthermore, MVM uses soft feature assignment, hence the contribution of each data point to each clustering view is variable, isolating the impact of data only to views where they assign the most features. Through a series of experiments, we demonstrate the utility of MVM as an inductive bias for capturing relations between words that are intuitive to humans, outperforming related models such as Latent Dirichlet Allocation.
    ML ID: 262
  46. Panning for Gold: Finding Relevant Semantic Content for Grounded Language Learning
    [Details] [PDF] [Slides]
    David L. Chen and Raymond J. Mooney
    In Proceedings of Symposium on Machine Learning in Speech and Language Processing (MLSLP 2011), June 2011.
    One of the key challenges in grounded language acquisition is resolving the intentions of the expressions. Typically the task involves identifying a subset of records from a list of candidates as the correct meaning of a sentence. While most current work assume complete or partial independence be- tween the records, we examine a scenario in which they are strongly related. By representing the set of potential meanings as a graph, we explicitly encode the relationships between the candidate meanings. We introduce a refinement algorithm that first learns a lexicon which is then used to remove parts of the graphs that are irrelevant. Experiments in a navigation domain shows that the algorithm successfully recovered over three quarters of the correct semantic content.
    ML ID: 261
  47. Fine-Grained Class Label Markup of Search Queries
    [Details] [PDF]
    Joseph Reisinger and Marius Pasca
    In Proceedings of The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2011), 1200-1209, June 2011.
    We develop a novel approach to the semantic analysis of short text segments and demonstrate its utility on a large corpus of Web search queries. Extracting meaning from short text segments is difficult as there is little semantic redundancy between terms; hence methods based on shallow semantic analysis may fail to accurately estimate meaning. Furthermore search queries lack explicit syntax often used to determine intent in question answering. In this paper we propose a hybrid model of semantic analysis combining explicit class-label extraction with a latent class PCFG. This class-label correlation (CLC) model admits a robust parallel approximation, allowing it to scale to large amounts of query data. We demonstrate its performance in terms of (1) its predicted label accuracy on polysemous queries and (2) its ability to accurately chunk queries into base constituents.
    ML ID: 260
  48. Collecting Highly Parallel Data for Paraphrase Evaluation
    [Details] [PDF] [Slides]
    David L. Chen and William B. Dolan
    In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 190-200, Portland, Oregon, USA, June 2011.
    A lack of standard datasets and evaluation metrics has prevented the field of paraphrasing from making the kind of rapid progress enjoyed by the machine translation community over the last 15 years. We address both problems by presenting a novel data collection framework that produces highly parallel text data relatively inexpensively and on a large scale. The highly parallel nature of this data allows us to use simple n-gram comparisons to measure both the semantic adequacy and lexical dissimilarity of paraphrase candidates. In addition to being simple and efficient to compute, experiments show that these metrics correlate highly with human judgments.
    ML ID: 259
  49. Extending Bayesian Logic Programs for Plan Recognition and Machine Reading
    [Details] [PDF] [Slides]
    Sindhu V. Raghavan
    Technical Report, PhD proposal, Department of Computer Science, The University of Texas at Austin, May 2011.

    Statistical relational learning (SRL) is the area of machine learning that integrates both first-order logic and probabilistic graphical models. The advantage of these formalisms is that they can handle both uncertainty and structured/relational data. As a result, they are widely used in domains like social network analysis, biological data analysis, and natural language processing. Bayesian Logic Programs (BLPs), which integrate both first-order logic and Bayesian networks are a powerful SRL formalism developed in the recent past. In this proposal, we focus on applying BLPs to two real worlds tasks -- plan recognition and machine reading.

    Plan recognition is the task of predicting an agent's top-level plans based on its observed actions. It is an abductive reasoning task that involves inferring cause from effect. In the first part of the proposal, we develop an approach to abductive plan recognition using BLPs. Since BLPs employ logical deduction to construct the networks, they cannot be used effectively for plan recognition as is. Therefore, we extend BLPs to use logical abduction to construct Bayesian networks and call the resulting model Bayesian Abductive Logic Programs (BALPs). Experimental evaluation on three benchmark data sets demonstrate that BALPs outperform the existing state-of-art methods like Markov Logic Networks (MLNs) for plan recognition.

    For future work, we propose to apply BLPs to the task of machine reading, which involves automatic extraction of knowledge from natural language text. Present day information extraction (IE) systems that are trained for machine reading are limited by their ability to extract only factual information that is stated explicitly in the text. We propose to improve the performance of an off-the-shelf IE system by inducing general knowledge rules about the domain using the facts already extracted by the IE system. We then use these rules to infer additional facts using BLPs, thereby improving the recall of the underlying IE system. Here again, the standard inference used in BLPs cannot be used to construct the networks. So, we extend BLPs to perform forward inference on all facts extracted by the IE system and then construct the ground Bayesian networks. We initially use an existing inductive logic programming (ILP) based rule learner to learn the rules. In the longer term, we would like to develop a rule/structure learner that is capable of learning an even better set of first-order rules for BLPs.

    ML ID: 258
  50. Improving the Accuracy and Scalability of Discriminative Learning Methods for Markov Logic Networks
    [Details] [PDF] [Slides]
    Tuyen N. Huynh
    PhD Thesis, Department of Computer Science, University of Texas at Austin, May 2011.
    159 pages.

    Many real-world problems involve data that both have complex structures and uncertainty. Statistical relational learning (SRL) is an emerging area of research that addresses the problem of learning from these noisy structured/relational data. Markov logic networks (MLNs), sets of weighted first-order logic formulae, are a simple but powerful SRL formalism that generalizes both first-order logic and Markov networks. MLNs have been successfully applied to a variety of real-world problems ranging from extraction knowledge from text to visual event recognition. Most of the existing learning algorithms for MLNs are in the generative setting: they try to learn a model that is equally capable of predicting the values of all variables given an arbitrary set of evidence; and they do not scale to problems with thousands of examples. However, many real-world problems in structured/relational data are discriminative---where the variables are divided into two disjoint sets input and output, and the goal is to correctly predict the values of the output variables given evidence data about the input variables. In addition, these problems usually involve data that have thousands of examples. Thus, it is important to develop new discriminative learning methods for MLNs that are more accurate and more scalable, which are the topics addressed in this thesis.

    First, we present a new method that discriminatively learns both the structure and parameters for a special class of MLNs where all the clauses are non-recursive ones. Non-recursive clauses arise in many learning problems in Inductive Logic Programming. To further improve the predictive accuracy, we propose a max-margin approach to learning weights for MLNs. Then, to address the issue of scalability, we present CDA, an online max-margin weight learning algorithm for MLNs. Ater that, we present OSL, the first algorithm that performs both online structure learning and parameter learning. Finally, we address an issue arising in applying MLNs to many real-world problems: learning in the presence of many hard constraints. Including hard constraints during training greatly increases the computational complexity of the learning problem. Thus, we propose a simple heuristic for selecting which hard constraints to include during training.

    Experimental results on several real-world problems show that the proposed methods are more accurate, more scalable (can handle problems with thousands of examples), or both more accurate and more scalable than existing learning methods for MLNs.

    ML ID: 257
  51. Online Max-Margin Weight Learning for Markov Logic Networks
    [Details] [PDF] [Slides]
    Tuyen N. Huynh and Raymond J. Mooney
    In Proceedings of the Eleventh SIAM International Conference on Data Mining (SDM11), 642--651, Mesa, Arizona, USA, April 2011.
    Most of the existing weight-learning algorithms for Markov Logic Networks (MLNs) use batch training which becomes computationally expensive and even infeasible for very large datasets since the training examples may not fit in main memory. To overcome this problem, previous work has used online learning algorithms to learn weights for MLNs. However, this prior work has only applied existing online algorithms, and there is no comprehensive study of online weight learning for MLNs. In this paper, we derive a new online algorithm for structured prediction using the primaldual framework, apply it to learn weights for MLNs, and compare against existing online algorithms on three large, real-world datasets. The experimental results show that our new algorithm generally achieves better accuracy than existing methods, especially on noisy datasets.
    ML ID: 255
  52. Implementing Weighted Abduction in Markov Logic
    [Details] [PDF]
    James Blythe, Jerry R. Hobbs, Pedro Domingos, Rohit J. Kate, Raymond J. Mooney
    In Proceedings of the International Conference on Computational Semantics, 55--64, Oxford, England, January 2011.
    Abduction is a method for finding the best explanation for observations. Arguably the most advanced approach to abduction, especially for natural language processing, is weighted abduction, which uses logical formulas with costs to guide inference. But it has no clear probabilistic semantics. In this paper we propose an approach that implements weighted abduction in Markov logic, which uses weighted first-order formulas to represent probabilistic knowledge, pointing toward a sound probabilistic semantics for weighted abduction. Application to a series of challenge problems shows the power and coverage of our approach
    ML ID: 254
  53. Integrating Logical Representations with Probabilistic Information using Markov Logic
    [Details] [PDF] [Slides]
    Dan Garrette, Katrin Erk, Raymond Mooney
    In Proceedings of the International Conference on Computational Semantics, 105--114, Oxford, England, January 2011.
    First-order logic provides a powerful and flexible mechanism for representing natural language semantics. However, it is an open question of how best to integrate it with uncertain, probabilistic knowledge, for example regarding word meaning. This paper describes the first steps of an approach to recasting first-order semantics into the probabilistic models that are part of Statistical Relational AI. Specifically, we show how Discourse Representation Structures can be combined with distributional models for word meaning inside a Markov Logic Network and used to successfully perform inferences that take advantage of logical concepts such as factivity as well as probabilistic information on word meaning in context.
    ML ID: 253
  54. A Mixture Model with Sharing for Lexical Semantics
    [Details] [PDF] [Slides]
    Joseph Reisinger and Raymond J. Mooney
    In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2010), 1173--1182, MIT, Massachusetts, USA, October 9--11 2010.
    We introduce tiered clustering, a mixture model capable of accounting for varying degrees of shared (context-independent) feature structure, and demonstrate its applicability to inferring distributed representations of word meaning. Common tasks in lexical semantics such as word relatedness or selectional preference can benefit from modeling such structure: Polysemous word usage is often governed by some common background metaphoric usage (e.g. the senses of line or run), and likewise modeling the selectional preference of verbs relies on identifying commonalities shared by their typical arguments. Tiered clustering can also be viewed as a form of soft feature selection, where features that do not contribute meaningfully to the clustering can be excluded. We demonstrate the applicability of tiered clustering, highlighting particular cases where modeling shared structure is beneficial and where it can be detrimental.
    ML ID: 252
  55. Generative Alignment and Semantic Parsing for Learning from Ambiguous Supervision
    [Details] [PDF]
    Joohyun Kim and Raymond J. Mooney
    In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), 543--551, Beijing, China, August 2010.
    We present a probabilistic generative model for learning semantic parsers from ambiguous supervision. Our approach learns from natural language sentences paired with world states consisting of multiple potential logical meaning representations. It disambiguates the meaning of each sentence while simultaneously learning a semantic parser that maps sentences into logical form. Compared to a previous generative model for semantic alignment, it also supports full semantic parsing. Experimental results on the Robocup sportscasting corpora in both English and Korean indicate that our approach produces more accurate semantic alignments than existing methods and also produces competitive semantic parsers and improved language generators.
    ML ID: 251
  56. Learning to Predict Readability using Diverse Linguistic Features
    [Details] [PDF] [Slides]
    Rohit J. Kate, Xiaoqiang Luo, Siddharth Patwardhan, Martin Franz, Radu Florian, Raymond J. Mooney, Salim Roukos and Chris Welty
    In 23rd International Conference on Computational Linguistics (COLING 2010), 2010.
    In this paper we consider the problem of building a system to predict readability of natural-language documents. Our system is trained using diverse features based on syntax and language models which are generally indicative of readability. The experimental results on a dataset of documents from a mix of genres show that the predictions of the learned system are more accurate than the predictions of naive human judges when compared against the predictions of linguistically-trained expert human judges. The experiments also compare the performances of different learning algorithms and different types of feature sets when used for predicting readability
    ML ID: 250
  57. Cross-cutting Models of Distributional Lexical Semantics
    [Details] [PDF] [Slides]
    Joseph S. Reisinger
    June 2010. Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin.
    In order to respond to increasing demand for natural language interfaces—and provide meaningful insight into user query intent—fast, scalable lexical semantic models with flexible representations are needed. Human concept organization is a rich epiphenomenon that has yet to be accounted for by a single coherent psychological framework: Concept generalization is captured by a mixture of prototype and exemplar models, and local taxonomic information is available through multiple overlapping organizational systems. Previous work in computational linguistics on extracting lexical semantic information from the Web does not provide adequate representational flexibility and hence fails to capture the full extent of human conceptual knowledge. In this proposal I will outline a family of probabilistic models capable of accounting for the rich organizational structure found in human language that can predict contextual variation, selectional preference and feature-saliency norms to a much higher degree of accuracy than previous approaches. These models account for cross-cutting structure of concept organization—i.e. the notion that humans make use of different categorization systems for different kinds of generalization tasks—and can be applied to Web-scale corpora. Using these models, natural language systems will be able to infer a more comprehensive semantic relations, in turn improving question answering, text classification, machine translation, and information retrieval.
    ML ID: 249
  58. Spherical Topic Models
    [Details] [PDF] [Slides]
    Joseph Reisinger, Austin Waters, Bryan Silverthorn, and Raymond J. Mooney
    In Proceedings of the 27th International Conference on Machine Learning (ICML 2010), 2010.
    We introduce the Spherical Admixture Model (SAM), a Bayesian topic model for arbitrary L2 normalized data. SAM maintains the same hierarchical structure as Latent Dirichlet Allocation (LDA), but models documents as points on a high-dimensional spherical manifold, allowing a natural likelihood parameterization in terms of cosine distance. Furthermore, SAM can model word absence/presence at the document level, and unlike previous models can assign explicit negative weight to topic terms. Performance is evaluated empirically, both through human ratings of topic quality and through diverse classification tasks from natural language processing and computer vision. In these experiments, SAM consistently outperforms existing models.
    ML ID: 248
  59. Joint Entity and Relation Extraction using Card-Pyramid Parsing
    [Details] [PDF] [Slides]
    Rohit J. Kate and Raymond J. Mooney
    In Proceedings of the Fourteenth Conference on Computational Natural Language Learning (CoNLL-2010), 203--212, Uppsala, Sweden, July 2010.
    Both entity and relation extraction can benefit from being performed jointly, allowing each task to correct the errors of the other. We present a new method for joint entity and relation extraction using a graph we call a "card-pyramid". This graph compactly encodes all possible entities and relations in a sentence, reducing the task of their joint extraction to jointly labeling its nodes. We give an efficient labeling algorithm that is analogous to parsing using dynamic programming. Experimental results show improved results for our joint extraction method compared to a pipelined approach.
    ML ID: 247
  60. Learning for Semantic Parsing Using Statistical Syntactic Parsing Techniques
    [Details] [PDF] [Slides]
    Ruifang Ge
    PhD Thesis, Department of Computer Science, University of Texas at Austin, Austin, TX, May 2010. 165 pages.
    Natural language understanding is a sub-field of natural language processing, which builds automated systems to understand natural language. It is such an ambitious task that it sometimes is referred to as an AI-complete problem, implying that its difficulty is equivalent to solving the central artificial intelligence problem -- making computers as intelligent as people. Despite its complexity, natural language understanding continues to be a fundamental problem in natural language processing in terms of its theoretical and empirical importance.

    In recent years, startling progress has been made at different levels of natural language processing tasks, which provides great opportunity for deeper natural language understanding. In this thesis, we focus on the task of semantic parsing, which maps a natural language sentence into a complete, formal meaning representation in a meaning representation language. We present two novel state-of-the-art learned syntax-based semantic parsers using statistical syntactic parsing techniques, motivated by the following two reasons. First, the syntax-based semantic parsing is theoretically well-founded in computational semantics. Second, adopting a syntax-based approach allows us to directly leverage the enormous progress made in statistical syntactic parsing.

    The first semantic parser, SCISSOR, adopts an integrated syntactic-semantic parsing approach, in which a statistical syntactic parser is augmented with semantic parameters to produce a semantically-augmented parse tree (SAPT). This integrated approach allows both syntactic and semantic information to be available during parsing time to obtain an accurate combined syntactic-semantic analysis. The performance of SCISSOR is further improved by using discriminative reranking for incorporating non-local features. The second semantic parser, SYNSEM, exploits an existing syntactic parser to produce disambiguated parse trees that drive the compositional semantic interpretation. This pipeline approach allows semantic parsing to conveniently leverage the most recent progress in statistical syntactic parsing.

    We report experimental results on two real applications: an interpreter for coaching instructions in robotic soccer and a natural-language database interface, showing that the improvement of SCISSOR and SYNSEM over other systems is mainly on long sentences, where the knowledge of syntax given in the form of annotated SAPTs or syntactic parses from an existing parser helps semantic composition. SYNSEM also significantly improves results with limited training data, and is shown to be robust to syntactic errors.

    ML ID: 246
  61. Online Max-Margin Weight Learning with Markov Logic Networks
    [Details] [PDF] [Slides]
    Tuyen N. Huynh and Raymond J. Mooney
    In Proceedings of the AAAI-10 Workshop on Statistical Relational AI (Star-AI 10), 32--37, Atlanta, GA, July 2010.
    Most of the existing weight-learning algorithms for Markov Logic Networks (MLNs) use batch training which becomes computationally expensive and even infeasible for very large datasets since the training examples may not fit in main memory. To overcome this problem, previous work has used online learning algorithms to learn weights for MLNs. However, this prior work has only applied existing online algorithms, and there is no comprehensive study of online weight learning for MLNs. In this paper, we derive new online algorithms for structured prediction using the primaldual framework, apply them to learn weights forMLNs, and compare against existing online algorithms on two large, real-world datasets. The experimental results show that the new algorithms achieve better accuracy than existing methods.
    ML ID: 245
  62. Bayesian Abductive Logic Programs
    [Details] [PDF] [Slides]
    Sindhu Raghavan and Raymond Mooney
    In Proceedings of the AAAI-10 Workshop on Statistical Relational AI (Star-AI 10), 82--87, Atlanta, GA, July 2010.
    In this paper, we introduce Bayesian Abductive Logic Programs (BALPs), a new formalism that integrates Bayesian Logic Programs (BLPs) and Abductive Logic Programming (ALP) for abductive reasoning. Like BLPs, BALPs also combine first-order logic and Bayesian networks. However, unlike BLPs that use logical deduction to construct Bayes nets, BALPs employ logical abduction. As a result, BALPs are more suited for solving problems like plan/activity recognition and diagnosis that require abductive reasoning. First, we present the necessary enhancements to BLPs in order to support logical abduction. Next, we apply BALPs to the task of plan recognition and demonstrate its efficacy on two data sets. We also compare the performance of BALPs with several existing approaches for abduction.
    ML ID: 244
  63. Authorship Attribution Using Probabilistic Context-Free Grammars
    [Details] [PDF] [Slides]
    Sindhu Raghavan, Adriana Kovashka and Raymond Mooney
    In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL-2010), 38--42, 2010.
    In this paper, we present a novel approach for authorship attribution, the task of identifying the author of a document, using probabilistic context-free grammars. Our approach involves building a probabilistic context-free grammar for each author and using this grammar as a language model for classification. We evaluate the performance of our method on a wide range of datasets to demonstrate its efficacy.
    ML ID: 243
  64. Using Closed Captions as Supervision for Video Activity Recognition
    [Details] [PDF]
    Sonal Gupta, Raymond J. Mooney
    In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-2010), 1083--1088, Atlanta, GA, July 2010.
    Recognizing activities in real-world videos is a difficult problem exacerbated by background clutter, changes in camera angle and zoom, and rapid camera movements. Large corpora of labeled videos can be used to train automated activity recognition systems, but this requires expensive human labor and time. This paper explores how closed captions that naturally accompany many videos can act as weak supervision that allows automatically collecting ‘labeled’ data for activity recognition. We show that such an approach can improve activity retrieval in soccer videos. Our system requires no manual labeling of video clips and needs minimal human supervision. We also present a novel caption classifier that uses additional linguistic information to determine whether a specific comment refers to an ongoing activity. We demonstrate that combining linguistic analysis and automatically trained activity recognizers can significantly improve the precision of video retrieval.
    ML ID: 242
  65. Multi-Prototype Vector-Space Models of Word Meaning
    [Details] [PDF] [Slides]
    Joseph Reisinger, Raymond J. Mooney
    In Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-2010), 109-117, 2010.
    Current vector-space models of lexical semantics create a single “prototype” vector to represent the meaning of a word. However, due to lexical ambiguity, encoding word meaning with a single vector is problematic. This paper presents a method that uses clustering to produce multiple “sense-specific&rdquo vectors for each word. This approach provides a context-dependent vector representation of word meaning that naturally accommodates homonymy and polysemy. Experimental comparisons to human judgements of semantic similarity for both isolated words as well as words in sentential contexts demonstrate the superiority of this approach over both prototype and exemplar based vector-space models.
    ML ID: 241
  66. Training a Multilingual Sportscaster: Using Perceptual Context to Learn Language
    [Details] [PDF]
    David L. Chen, Joohyun Kim, Raymond J. Mooney
    Journal of Artificial Intelligence Research, 37:397--435, 2010.
    We present a novel framework for learning to interpret and generate language using only perceptual context as supervision. We demonstrate its capabilities by developing a system that learns to sportscast simulated robot soccer games in both English and Korean without any language-specific prior knowledge. Training employs only ambiguous supervision consisting of a stream of descriptive textual comments and a sequence of events extracted from the simulation trace. The system simultaneously establishes correspondences between individual comments and the events that they describe while building a translation model that supports both parsing and generation. We also present a novel algorithm for learning which events are worth describing. Human evaluations of the generated commentaries indicate they are of reasonable quality and in some cases even on par with those produced by humans for our limited domain.
    ML ID: 240
  67. Learning Language from Perceptual Context
    [Details] [PDF] [Slides]
    David L. Chen
    December 2009. Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin.
    Most current natural language processing (NLP) systems are built using statistical learning algorithms trained on large annotated corpora which can be expensive and time-consuming to collect. In contrast, humans can learn language through exposure to linguistic input in the context of a rich, relevant, perceptual environment. If a machine learning system can acquire language in a similar manner without explicit human supervision, then it can leverage the large amount of available text that refers to observed world states (e.g. sportscasts, instruction manuals, weather forecasts, etc.) Thus, my research focuses on how to build systems that use both text and the perceptual context in which it is used in order to learn a language. I will first present a system we completed that can describe events in RoboCup 2D simulation games by learning only from sample language commentaries paired with traces of simulated activities without any language-specific prior knowledge. By applying an EM-like algorithm, the system was able to simultaneously learn a grounded language model as well as align the ambiguous training data. Human evaluations of the generated commentaries indicate they are of reasonable quality and in some cases even on par with those produced by humans. For future work, I am proposing to solve the more complex task of learning how to give and receive navigation instructions in a virtual environment. In this setting, each instruction corresponds to a navigation plan that is not directly observable. Since an exponential number of plans can all lead to the same observed actions, we have to learn from compact representations of all valid plans rather than enumerating all possible meanings as we did in the sportscasting task. Initially, the system will passively observe a human giving instruction to another human, and try to learn the correspondences between the instructions and the intended plan. After the system has a decent understanding of the language, it can then participate in the interactions to learn more directly by playing either the role of the instructor or the follower.
    ML ID: 239
  68. Discriminative Learning with Markov Logic Networks
    [Details] [PDF] [Slides]
    Tuyen N. Huynh
    October 2009. Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin.
    Statistical relational learning (SRL) is an emerging area of research that addresses the problem of learning from noisy structured/relational data. Markov logic networks (MLNs), sets of weighted clauses, are a simple but powerful SRL formalism that combines the expressivity of first-order logic with the flexibility of probabilistic reasoning. Most of the existing learning algorithms for MLNs are in the generative setting: they try to learn a model that maximizes the likelihood of the training data. However, most of the learning problems in relational data are discriminative. So to utilize the power of MLNs, we need discriminative learning methods that well match these discriminative tasks.

    In this proposal, we present two new discriminative learning algorithms for MLNs. The first one is a discriminative structure and weight learner for MLNs with non-recursive clauses. We use a variant of Aleph, an off-the-shelf Inductive Logic Programming (ILP) system, to learn a large set of Horn clauses from the training data, then we apply an L1-regularization weight learner to select a small set of non-zero weight clauses that maximizes the conditional log-likelihood (CLL) of the training data. The experimental results show that our proposed algorithm outperforms existing learning methods for MLNs and traditional ILP systems in term of predictive accuracy, and its performance is comparable to state-of-the-art results on some ILP benchmarks. The second algorithm we present is a max-margin weight learner for MLNs. Instead of maximizing the CLL of the data like all existing discriminative weight learners for MLNs, the new weight learner tries to maximize the ratio between the probability of the correct label (the observable data) and and the closest incorrect label (among all the wrong labels, this one has the highest probability), which can be formulated as an optimization problem called 1-slack structural SVM. This optimization problem can be solved by an efficient algorithm based on the cutting plane method. However, this cutting plane algorithm requires an efficient inference method as a subroutine. Unfortunately, exact inference in MLNs is intractable. So we develop a new approximation inference method for MLNs based on Linear Programming relaxation. Extensive experiments in two real-world MLN applications demonstrate that the proposed max-margin weight learner generally achieves higher F1 scores than the current best discriminative weight learner for MLNs.

    For future work, our short-term goal is to develop a more efficient inference algorithm and test our max-margin weight learner on more complex problems where there are complicated relationships between the input and output variables and among the outputs. In the longer-term, our plan is to develop more efficient learning algorithms through online learning and algorithms that revise both the clauses and their weights to improve predictive performance.

    ML ID: 238
  69. Spherical Topic Models
    [Details] [PDF]
    Joseph Reisinger, Austin Waters, Bryan Silverthorn, and Raymond Mooney
    In NIPS'09 workshop: Applications for Topic Models: Text and Beyond, 2009.
    We introduce the Spherical Admixture Model (SAM), a Bayesian topic model over arbitrary L2 normalized data. SAM models documents as points on a high- dimensional spherical manifold, and is capable of representing negative word- topic correlations and word presence/absence, unlike models with multinomial document likelihood, such as LDA. In this paper, we evaluate SAM as a topic browser, focusing on its ability to model “negative” topic features, and also as a dimensionality reduction method, using topic proportions as features for difficult classification tasks in natural language processing and computer vision.
    ML ID: 237
  70. Activity Retrieval in Closed Captioned Videos
    [Details] [PDF]
    Sonal Gupta
    Masters Thesis, Department of Computer Sciences, University of Texas at Austin, August 2009. 64 pages.
    Recognizing activities in real-world videos is a difficult problem exacerbated by background clutter, changes in camera angle and zoom, occlusion and rapid camera movements. Large corpora of labeled videos can be used to train automated activity recognition systems, but this requires expensive human labor and time. This thesis explores how closed captions that naturally accompany many videos can act as weak supervision that allows automatically collecting “labeled” data for activity recognition. We show that such an approach can improve activity retrieval in soccer videos. Our system requires no manual labeling of video clips and needs minimal human supervision. We also present a novel caption classifier that uses additional linguistic information to determine whether a specific comment refers to an ongoing activity. We demonstrate that combining linguistic analysis and automatically trained activity recognizers can significantly improve the precision of video retrieval.
    ML ID: 236
  71. Learning with Markov Logic Networks: Transfer Learning, Structure Learning, and an Application to Web Query Disambiguation
    [Details] [PDF]
    Lilyana Mihalkova
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, August 2009. 176 pages.
    Traditionally, machine learning algorithms assume that training data is provided as a set of independent instances, each of which can be described as a feature vector. In contrast, many domains of interest are inherently multi-relational, consisting of entities connected by a rich set of relations. For example, the participants in a social network are linked by friendships, collaborations, and shared interests. Likewise, the users of a search engine are related by searches for similar items and clicks to shared sites. The ability to model and reason about such relations is essential not only because better predictive accuracy is achieved by exploiting this additional information, but also because frequently the goal is to predict whether a set of entities are related in a particular way. This thesis falls within the area of Statistical Relational Learning (SRL), which combines ideas from two traditions within artificial intelligence, first-order logic and probabilistic graphical models, to address the challenge of learning from multi-relational data. We build on one particular SRL model, Markov logic networks (MLNs), which consist of a set of weighted first-order-logic formulae and provide a principled way of defining a probability distribution over possible worlds. We develop algorithms for learning of MLN structure both from scratch and by transferring a previously learned model, as well as an application of MLNs to the problem of Web query disambiguation. The ideas we present are unified by two main themes: the need to deal with limited training data and the use of bottom-up learning techniques.

    Structure learning, the task of automatically acquiring a set of dependencies among the relations in the domain, is a central problem in SRL. We introduce BUSL, an algorithm for learning MLN structure from scratch that proceeds in a more bottom-up fashion, breaking away from the tradition of top-down learning typical in SRL. Our approach first constructs a novel data structure called a Markov network template that is used to restrict the search space for clauses. Our experiments in three relational domains demonstrate that BUSL dramatically reduces the search space for clauses and attains a significantly higher accuracy than a structure learner that follows a top-down approach.

    Accurate and efficient structure learning can also be achieved by transferring a model obtained in a source domain related to the current target domain of interest. We view transfer as a revision task and present an algorithm that diagnoses a source MLN to determine which of its parts transfer directly to the target domain and which need to be updated. This analysis focuses the search for revisions on the incorrect portions of the source structure, thus speeding up learning. Transfer learning is particularly important when target-domain data is limited, such as when data on only a few individuals is available from domains with hundreds of entities connected by a variety of relations. We also address this challenging case and develop a general transfer learning approach that makes effective use of such limited target data in several social network domains.

    Finally, we develop an application of MLNs to the problem of Web query disambiguation in a more privacy-aware setting where the only information available about a user is that captured in a short search session of 5--6 previous queries on average. This setting contrasts with previous work that typically assumes the availability of long user-specific search histories. To compensate for the scarcity of user-specific information, our approach exploits the relations between users, search terms, and URLs. We demonstrate the effectiveness of our approach in the presence of noise and show that it outperforms several natural baselines on a large data set collected from the MSN search engine.

    ML ID: 235
  72. Max-Margin Weight Learning for Markov Logic Networks
    [Details] [PDF] [Slides]
    Tuyen N. Huynh and Raymond J. Mooney
    In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Part 1, 564--579, Bled, Slovenia, September 2009.
    Markov logic networks (MLNs) are an expressive representation for statistical relational learning that generalizes both first-order logic and graphical models. Existing discriminative weight learning methods for MLNs all try to learn weights that optimize the Conditional Log Likelihood (CLL) of the training examples. In this work, we present a new discriminative weight learning method for MLNs based on a max-margin framework. This results in a new model, Max-Margin Markov Logic Networks (M3LNs), that combines the expressiveness of MLNs with the predictive accuracy of structural Support Vector Machines (SVMs). To train the proposed model, we design a new approximation algorithm for loss-augmented inference in MLNs based on Linear Programming (LP). The experimental result shows that the proposed approach generally achieves higher F1 scores than the current best discriminative weight learner for MLNs.
    ML ID: 234
  73. Learning to Disambiguate Search Queries from Short Sessions
    [Details] [PDF]
    Lilyana Mihalkova and Raymond Mooney
    In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Part 2, 111--127, Bled, Slovenia, September 2009.
    Web searches tend to be short and ambiguous. It is therefore not surprising that Web query disambiguation is an actively researched topic. To provide a personalized experience for a user, most existing work relies on search engine log data in which the search activities of that particular user, as well as other users, are recorded over long periods of time. Such approaches may raise privacy concerns and may be difficult to implement for pragmatic reasons. We present an approach to Web query disambiguation that bases its predictions only on a short glimpse of user search activity, captured in a brief session of 4--6 previous searches on average. Our method exploits the relations of the current search session to previous similarly short sessions of other users in order to predict the user's intentions and is based on Markov logic, a statistical relational learning model that has been successfully applied to challenging language problems in the past. We present empirical results that demonstrate the effectiveness of our proposed approach on data collected from a commercial general-purpose search engine.
    ML ID: 233
  74. Max-Margin Weight Learning for Markov Logic Networks
    [Details] [PDF]
    Tuyen N. Huynh and Raymond J. Mooney
    In Proceedings of the International Workshop on Statistical Relational Learning (SRL-09), Leuven, Belgium, July 2009.
    Markov logic networks (MLNs) are an expressive representation for statistical relational learning that generalizes both first-order logic and graphical models. Existing discriminative weight learning methods for MLNs all try to learn weights that optimize the Conditional Log Likelihood (CLL) of the training examples. In this work, we present a new discriminative weight learning method for MLNs based on a max-margin framework. This results in a new model, Max-Margin Markov Logic Networks (M3LNs), that combines the expressiveness of MLNs with the predictive accuracy of structural Support Vector Machines (SVMs). To train the proposed model, we design a new approximation algorithm for loss-augmented inference in MLNs based on Linear Programming (LP). The experimental result shows that the proposed approach generally achieves higher F1 scores than the current best discriminative weight learner for MLNs.
    ML ID: 232
  75. Speeding up Inference In Statistical Relational Learning by Clustering Similar Query Literals
    [Details] [PDF]
    Lilyana Mihalkova and Matthew Richardson
    In Proceedings of the 19th International Conference on Inductive Logic Programming (ILP-09), Leuven, Belgium, July 2009.
    Markov logic networks (MLNs) have been successfully applied to several challenging problems by taking a programming language approach where a set of formulas is hand-coded and weights are learned from data. Because inference plays an important role in this process, programming with an MLN would be significantly facilitated by speeding up inference. We present a new meta-inference algorithm that exploits the repeated structure frequently present in relational domains to speed up existing inference techniques. Our approach first clusters the query literals and then performs full inference for only one representative from each cluster. The clustering step incurs only a one-time up-front cost when weights are learned over a fixed structure.
    ML ID: 231
  76. Learning a Compositional Semantic Parser using an Existing Syntactic Parser
    [Details] [PDF] [Slides]
    Ruifang Ge and Raymond J. Mooney
    In Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2009), 611--619, Suntec, Singapore, August 2009.
    We present a new approach to learning a semantic parser (a system that maps natural language sentences into logical form). Unlike previous methods, it exploits an existing syntactic parser to produce disambiguated parse trees that drive the compositional semantic interpretation. The resulting system produces improved results on standard corpora on natural language interfaces for database querying and simulated robot control.
    ML ID: 229
  77. Probabilistic Abduction using Markov Logic Networks
    [Details] [PDF] [Slides]
    Rohit J. Kate and Raymond J. Mooney
    In Proceedings of the IJCAI-09 Workshop on Plan, Activity, and Intent Recognition (PAIR-09), Pasadena, CA, July 2009.
    Abduction is inference to the best explanation of a given set of evidence. It is important for plan or intent recognition systems. Traditional approaches to abductive reasoning have either used first-order logic, which is unable to reason under uncertainty, or Bayesian networks, which can handle uncertainty using probabilities but cannot directly handle an unbounded number of related entities. This paper proposes a new method for probabilistic abductive reasoning that combines the capabilities of first-order logic and graphical models by using Markov logic networks. Experimental results on a plan recognition task demonstrate the effectiveness of this method.
    ML ID: 228
  78. Transfer Learning from Minimal Target Data by Mapping across Relational Domains
    [Details] [PDF]
    Lilyana Mihalkova and Raymond Mooney
    In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI-09), 1163--1168, Pasadena, CA, July 2009.
    A central goal of transfer learning is to enable learning when training data from the domain of interest is limited. Yet, work on transfer across relational domains has so far focused on the case where there is a significant amount of target data. This paper bridges this gap by studying transfer when the amount of target data is minimal and consists of information about just a handful of entities. In the extreme case, only a single entity is known. We present the SR2LR algorithm that finds an effective mapping of predicates from a source model to the target domain in this setting and thus renders pre-existing knowledge useful to the target task. We demonstrate SR2LR's effectiveness in three benchmark relational domains on social interactions and study its behavior as information about an increasing number of entities becomes available.
    ML ID: 227
  79. Using Closed Captions to Train Activity Recognizers that Improve Video Retrieval
    [Details] [PDF]
    Sonal Gupta and Raymond Mooney
    In Proceedings of the CVPR-09 Workshop on Visual and Contextual Learning from Annotated Images and Videos (VCL), Miami, FL, June 2009.
    Recognizing activities in real-world videos is a difficult problem exacerbated by background clutter, changes in camera angle & zoom, rapid camera movements etc. Large corpora of labeled videos can be used to train automated activity recognition systems, but this requires expensive human labor and time. This paper explores how closed captions that naturally accompany many videos can act as weak supervision that allows automatically collecting labeled data for activity recognition. We show that such an approach can improve activity retrieval in soccer videos. Our system requires no manual labeling of video clips and needs minimal human supervision. We also present a novel caption classifier that uses additional linguistic information to determine whether a specific comment refers to an ongoing activity. We demonstrate that combining linguistic analysis and automatically trained activity recognizers can significantly improve the precision of video retrieval.
    ML ID: 226
  80. Semi-supervised graph clustering: a kernel approach
    [Details] [PDF]
    Brian Kulis, Sugato Basu, Inderjit Dhillon, and Raymond Mooney
    Machine Learning Journal, 74(1):1-22, 2009.
    Semi-supervised clustering algorithms aim to improve clustering results using limited supervision. The supervision is generally given as pairwise constraints; such constraints are natural for graphs, yet most semi-supervised clustering algorithms are designed for data represented as vectors. In this paper, we unify vector-based and graph-based approaches. We first show that a recently-proposed objective function for semi-supervised clustering based on Hidden Markov Random Fields, with squared Euclidean distance and a certain class of constraint penalty functions, can be expressed as a special case of the weighted kernel k-means objective (Dhillon et al., in Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining, 2004a). A recent theoretical connection between weighted kernel k-means and several graph clustering objectives enables us to perform semi-supervised clustering of data given either as vectors or as a graph. For graph data, this result leads to algorithms for optimizing several new semi-supervised graph clustering objectives. For vector data, the kernel approach also enables us to find clusters with non-linear boundaries in the input data space. Furthermore, we show that recent work on spectral learning (Kamvar et al., in Proceedings of the 17th International Joint Conference on Artificial Intelligence, 2003) may be viewed as a special case of our formulation. We empirically show that our algorithm is able to outperform current state-of-the-art semisupervised algorithms on both vector-based and graph-based data sets.
    ML ID: 224
  81. Search Query Disambiguation from Short Sessions
    [Details] [PDF]
    Lilyana Mihalkova and Raymond Mooney
    In Beyond Search: Computational Intelligence for the Web Workshop at NIPS, 2008.
    Web searches tend to be short and ambiguous. It is therefore not surprising that Web query disambiguation is an actively researched topic. However, most existing work relies on the existence of search engine log data in which each user's search activities are recorded over long periods of time. Such approaches may raise privacy concerns and may be difficult to implement for pragmatic reasons. In this work, we present an approach to Web query disambiguation that bases its predictions only on a short glimpse of user search activity, captured in a brief session of about 5--6 previous searches on average. Our method exploits the relations of the current search session in which the ambiguous query is issued to previous sessions in order to predict the user's intentions and is based on Markov logic. We present empirical results that demonstrate the effectiveness of our proposed approach on data collected form a commercial general-purpose search engine.
    ML ID: 225
  82. A Dependency-based Word Subsequence Kernel
    [Details] [PDF]
    Rohit J. Kate
    In Proceedings of the conference on Empirical Methods in Natural Language Processing (EMNLP-2008), 400--409, Waikiki, Honolulu, Hawaii, October 2008.
    This paper introduces a new kernel which computes similarity between two natural language sentences as the number of paths shared by their dependency trees. The paper gives a very efficient algorithm to compute it. This kernel is also an improvement over the word subsequence kernel because it only counts linguistically meaningful word subsequences which are based on word dependencies. It overcomes some of the difficulties encountered by syntactic tree kernels as well. Experimental results demonstrate the advantage of this kernel over word subsequence and syntactic tree kernels.
    ML ID: 223
  83. Transforming Meaning Representation Grammars to Improve Semantic Parsing
    [Details] [PDF]
    Rohit J. Kate
    In Proceedings of the Twelfth Conference on Computational Natural Language Learning (CoNLL-2008), 33--40, Manchester, UK, August 2008.
    A semantic parser learning system learns to map natural language sentences into their domain-specific formal meaning representations, but if the constructs of the meaning representation language do not correspond well with the natural language then the system may not learn a good semantic parser. This paper presents approaches for automatically transforming a meaning representation grammar (MRG) to conform it better with the natural language semantics. It introduces grammar transformation operators and meaning representation macros which are applied in an error-driven manner to transform an MRG while training a semantic parser learning system. Experimental results show that the automatically transformed MRGs lead to better learned semantic parsers which perform comparable to the semantic parsers learned using manually engineered MRGs.
    ML ID: 222
  84. Watch, Listen & Learn: Co-training on Captioned Images and Videos
    [Details] [PDF]
    Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney
    In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 457--472, Antwerp Belgium, September 2008.
    Recognizing visual scenes and activities is challenging: often visual cues alone are ambiguous, and it is expensive to obtain manually labeled examples from which to learn. To cope with these constraints, we propose to leverage the text that often accompanies visual data to learn robust models of scenes and actions from partially labeled collections. Our approach uses co-training, a semi-supervised learning method that accommodates multi-modal views of data. To classify images, our method learns from captioned images of natural scenes; and to recognize human actions, it learns from videos of athletic events with commentary. We show that by exploiting both multi-modal representations and unlabeled data our approach learns more accurate image and video classifiers than standard baseline algorithms.
    ML ID: 221
  85. Discriminative Structure and Parameter Learning for Markov Logic Networks
    [Details] [PDF] [Slides]
    Tuyen N. Huynh and Raymond J. Mooney
    In Proceedings of the 25th International Conference on Machine Learning (ICML), Helsinki, Finland, July 2008.
    Markov logic networks (MLNs) are an expressive representation for statistical relational learning that generalizes both first-order logic and graphical models. Existing methods for learning the logical structure of an MLN are not discriminative; however, many relational learning problems involve specific target predicates that must be inferred from given background information. We found that existing MLN methods perform very poorly on several such ILP benchmark problems, and we present improved discriminative methods for learning MLN clauses and weights that outperform existing MLN and traditional ILP methods.
    ML ID: 220
  86. Learning to Sportscast: A Test of Grounded Language Acquisition
    [Details] [PDF] [Slides] [Video]
    David L. Chen and Raymond J. Mooney
    In Proceedings of the 25th International Conference on Machine Learning (ICML), Helsinki, Finland, July 2008.
    We present a novel commentator system that learns language from sportscasts of simulated soccer games. The system learns to parse and generate commentaries without any engineered knowledge about the English language. Training is done using only ambiguous supervision in the form of textual human commentaries and simulation states of the soccer games. The system simultaneously tries to establish correspondences between the commentaries and the simulation states as well as build a translation model. We also present a novel algorithm, Iterative Generation Strategy Learning (IGSL), for deciding which events to comment on. Human evaluations of the generated commentaries indicate they are of reasonable quality compared to human commentaries.
    ML ID: 219
  87. Transfer Learning by Mapping with Minimal Target Data
    [Details] [PDF]
    Lilyana Mihalkova and Raymond J. Mooney
    In Proceedings of the AAAI-08 Workshop on Transfer Learning For Complex Tasks, Chicago, IL, July 2008.
    This paper introduces the single-entity-centered setting for transfer across two relational domains. In this setting, target domain data contains information about only a single entity. We present the SR2LR algorithm that finds an effective mapping of the source model to the target domain in this setting and demonstsrate its effectiveness in three relational domains. Our experiments additionally show that the most accurate model for the source domain is not always the best model to use for transfer.
    ML ID: 218
  88. Learning to Connect Language and Perception
    [Details] [PDF]
    Raymond J. Mooney
    In Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI), 1598--1601, Chicago, IL, July 2008. Senior Member Paper.
    To truly understand language, an intelligent system must be able to connect words, phrases, and sentences to its perception of objects and events in the world. Current natural language processing and computer vision systems make extensive use of machine learning to acquire the probabilistic knowledge needed to comprehend linguistic and visual input. However, to date, there has been relatively little work on learning the relationships between the two modalities. In this talk, I will review some of the existing work on learning to connect language and perception, discuss important directions for future research in this area, and argue that the time is now ripe to make a concerted effort to address this important, integrative AI problem.
    ML ID: 216
  89. Improving Learning of Markov Logic Networks using Transfer and Bottom-Up Induction
    [Details] [PDF]
    Lilyana Mihalkova
    Technical Report UT-AI-TR-07-341, Artificial Intelligence Lab, University of Texas at Austin, Austin, TX, May 2007.
    Statistical relational learning (SRL) algorithms combine ideas from rich knowledge representations, such as first-order logic, with those from probabilistic graphical models, such as Markov networks, to address the problem of learning from multi-relational data. One challenge posed by such data is that individual instances are frequently very large and include complex relationships among the entities. Moreover, because separate instances do not follow the same structure and contain varying numbers of entities, they cannot be effectively represented as a feature-vector. SRL models and algorithms have been successfully applied to a wide variety of domains such as social network analysis, biological data analysis, and planning, among others. Markov logic networks (MLNs) are a recently-developed SRL model that consists of weighted first-order clauses. MLNs can be viewed as templates that define Markov networks when provided with the set of constants present in a domain. MLNs are therefore very powerful because they inherit the expressivity of first-order logic. At the same time, MLNs can flexibly deal with noisy or uncertain data to produce probabilistic predictions for a set of propositions. MLNs have also been shown to subsume several other popular SRL models.

    The expressive power of MLNs comes at a cost: structure learning, or learning the first-order clauses of the model, is a very computationally intensive process that needs to sift through a large hypothesis space with many local maxima and plateaus. It is therefore an important research problem to develop learning algorithms that improve the speed and accuracy of this process. The main contribution of this proposal are two algorithms for learning the structure of MLNs that proceed in a more data-driven fashion, in contrast to most existing SRL algorithms. The first algorithm we present, R-TAMAR, improves learning by transferring the structure of an MLN learned in a domain related to the current one. It first diagnoses the transferred structure and then focuses its efforts only on the regions it determines to be incorrect. Our second algorithm, BUSL improves structure learning from scratch by approaching the problem in a more bottom-up fashion and first constructing a variablized Markov network template that significantly constrains the space of viable clause candidates. We demonstrate the effectiveness of our methods in three social domains.

    Our proposed future work directions include testing BUSL in additional domains and extending it so that it can be used not only to learn from scratch, but also to revise a provided MLN structure. Our most ambitious long-term goal is to develop a system that transfers knowledge from multiple potential sources. An important prerequisite to such a system is a method for measuring the similarity between domains. We would also like to extend BUSL to learn other SRL models and to handle functions.

    ML ID: 217
  90. Learning for Semantic Parsing with Kernels under Various Forms of Supervision
    [Details] [PDF] [Slides]
    Rohit J. Kate
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, August 2007. 159 pages.
    Semantic parsing involves deep semantic analysis that maps natural language sentences to their formal executable meaning representations. This is a challenging problem and is critical for developing computing systems that understand natural language input. This thesis presents a new machine learning approach for semantic parsing based on string-kernel-based classification. It takes natural language sentences paired with their formal meaning representations as training data. For every production in the formal language grammar, a Support-Vector Machine (SVM) classifier is trained using string similarity as the kernel. Meaning representations for novel natural language sentences are obtained by finding the most probable semantic parse using these classifiers. This method does not use any hard-matching rules and unlike previous and other recent methods, does not use grammar rules for natural language, probabilistic or otherwise, which makes it more robust to noisy input.

    Besides being robust, this approach is also flexible and able to learn under a wide range of supervision, from extra to weaker forms of supervision. It can easily utilize extra supervision given in the form of syntactic parse trees for natural language sentences by using a syntactic tree kernel instead of a string kernel. Its learning algorithm can also take advantage of detailed supervision provided in the form of semantically augmented parse trees. A simple extension using transductive SVMs enables the system to do semi-supervised learning and improve its performance utilizing unannotated sentences which are usually easily available. Another extension involving EM-like retraining makes the system capable of learning under ambiguous supervision in which the correct meaning representation for each sentence is not explicitly given, but instead a set of possible meaning representations is given. This weaker and more general form of supervision is better representative of a natural training environment for a language-learning system requiring minimal human supervision.

    For a semantic parser to work well, conformity between natural language and meaning representation grammar is necessary. However meaning representation grammars are typically designed to best suit the application which will use the meaning representations with little consideration for how well they correspond to natural language semantics. We present approaches to automatically transform meaning representation grammars to make them more compatible with natural language semantics and hence more suitable for learning semantic parsers. Finally, we also show that ensembles of different semantic parser learning systems can obtain the best overall performance.

    ML ID: 215
  91. Learning for Semantic Parsing and Natural Language Generation Using Statistical Machine Translation Techniques
    [Details] [PDF]
    Yuk Wah Wong
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, August 2007. 188 pages. Also appears as Technical Report AI07-343, Artificial Intelligence Lab, University of Texas at Austin, August 2007.
    One of the main goals of natural language processing (NLP) is to build automated systems that can understand and generate human languages. This goal has so far remained elusive. Existing hand-crafted systems can provide in-depth analysis of domain sub-languages, but are often notoriously fragile and costly to build. Existing machine-learned systems are considerably more robust, but are limited to relatively shallow NLP tasks.

    In this thesis, we present novel statistical methods for robust natural language understanding and generation. We focus on two important sub-tasks, semantic parsing and tactical generation. The key idea is that both tasks can be treated as the translation between natural languages and formal meaning representation languages, and therefore, can be performed using state-of-the-art statistical machine translation techniques. Specifically, we use a technique called synchronous parsing, which has been extensively used in syntax-based machine translation, as the unifying framework for semantic parsing and tactical generation. The parsing and generation algorithms learn all of their linguistic knowledge from annotated corpora, and can handle natural-language sentences that are conceptually complex.

    A nice feature of our algorithms is that the semantic parsers and tactical generators share the same learned synchronous grammars. Moreover, charts are used as the unifying language-processing architecture for efficient parsing and generation. Therefore, the generators are said to be the inverse of the parsers, an elegant property that has been widely advocated. Furthermore, we show that our parsers and generators can handle formal meaning representation languages containing logical variables, including predicate logic.

    Our basic semantic parsing algorithm is called WASP. Most of the other parsing and generation algorithms presented in this thesis are extensions of WASP or its inverse. We demonstrate the effectiveness of our parsing and generation algorithms by performing experiments in two real-world, restricted domains. Experimental results show that our algorithms are more robust and accurate than the currently best systems that require similar supervision. Our work is also the first attempt to use the same automatically-learned grammar for both parsing and generation. Unlike previous systems that require manually-constructed grammars and lexicons, our systems require much less knowledge engineering and can be easily ported to other languages and domains.

    ML ID: 214
  92. Learning for Information Extraction: From Named Entity Recognition and Disambiguation To Relation Extraction
    [Details] [PDF]
    Razvan Constantin Bunescu
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, August 2007. 150 pages. Also as Technical Report AI07-345, Artificial Intelligence Lab, University of Texas at Austin, August 2007.
    Information Extraction, the task of locating textual mentions of specific types of entities and their relationships, aims at representing the information contained in text documents in a structured format that is more amenable to applications in data mining, question answering, or the semantic web. The goal of our research is to design information extraction models that obtain improved performance by exploiting types of evidence that have not been explored in previous approaches. Since designing an extraction system through introspection by a domain expert is a laborious and time consuming process, the focus of this thesis will be on methods that automatically induce an extraction model by training on a dataset of manually labeled examples.

    Named Entity Recognition is an information extraction task that is concerned with finding textual mentions of entities that belong to a predefined set of categories. We approach this task as a phrase classification problem, in which candidate phrases from the same document are collectively classified. Global correlations between candidate entities are captured in a model built using the expressive framework of Relational Markov Networks. Additionally, we propose a novel tractable approach to phrase classification for named entity recognition based on a special Junction Tree representation.

    Classifying entity mentions into a predefined set of categories achieves only a partial disambiguation of the names. This is further refined in the task of Named Entity Disambiguation, where names need to be linked to their actual denotations. In our research, we use Wikipedia as a repository of named entities and propose a ranking approach to disambiguation that exploits learned correlations between words from the name context and categories from the Wikipedia taxonomy.

    Relation Extraction refers to finding relevant relationships between entities mentioned in text documents. Our approaches to this information extraction task differ in the type and the amount of supervision required. We first propose two relation extraction methods that are trained on documents in which sentences are manually annotated for the required relationships. In the first method, the extraction patterns correspond to sequences of words and word classes anchored at two entity names occurring in the same sentence. These are used as implicit features in a generalized subsequence kernel, with weights computed through training of Support Vector Machines. In the second approach, the implicit extraction features are focused on the shortest path between the two entities in the word-word dependency graph of the sentence. Finally, in a significant departure from previous learning approaches to relation extraction, we propose reducing the amount of required supervision to only a handful of pairs of entities known to exhibit or not exhibit the desired relationship. Each pair is associated with a bag of sentences extracted automatically from a very large corpus. We extend the subsequence kernel to handle this weaker form of supervision, and describe a method for weighting features in order to focus on those correlated with the target relation rather than with the individual entities. The resulting Multiple Instance Learning approach offers a competitive alternative to previous relation extraction methods, at a significantly reduced cost in human supervision.

    ML ID: 213
  93. Learning to Extract Relations from the Web using Minimal Supervision
    [Details] [PDF]
    Razvan C. Bunescu and Raymond J. Mooney
    In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL'07), Prague, Czech Republic, June 2007.
    We present a new approach to relation extraction that requires only a handful of training examples. Given a few pairs of named entities known to exhibit or not exhibit a particular relation, bags of sentences containing the pairs are extracted from the web. We extend an existing relation extraction method to handle this weaker form of supervision, and present experimental results demonstrating that our approach can reliably extract relations from web documents.
    ML ID: 204
  94. Mapping and Revising Markov Logic Networks for Transfer Learning
    [Details] [PDF]
    Lilyana Mihalkova, Tuyen N. Huynh, Raymond J. Mooney
    In Proceedings of the Twenty-Second Conference on Artificial Intelligence (AAAI-07), 608-614, Vancouver, BC, July 2007.
    Transfer learning addresses the problem of how to leverage knowledge acquired in a source domain to improve the accuracy and speed of learning in a related target domain. This paper considers transfer learning with Markov logic networks (MLNs), a powerful formalism for learning in relational domains. We present a complete MLN transfer system that first autonomously maps the predicates in the source MLN to the target domain and then revises the mapped structure to further improve its accuracy. Our results in several real-world domains demonstrate that our approach successfully reduces the amount of time and training data needed to learn an accurate model of a target domain over learning from scratch.
    ML ID: 203
  95. Bottom-Up Learning of Markov Logic Network Structure
    [Details] [PDF]
    Lilyana Mihalkova and Raymond J. Mooney
    In Proceedings of 24th International Conference on Machine Learning (ICML-2007), Corvallis, OR, June 2007.
    Markov logic networks (MLNs) are a statistical relational model that consists of weighted first-order clauses and generalizes first-order logic and Markov networks. The current state-of-the-art algorithm for learning MLN structure follows a top-down paradigm where many potential candidate structures are systematically generated without considering the data and then evaluated using a statistical measure of their fit to the data. Even though this existing algorithm outperforms an impressive array of benchmarks, its greedy search is susceptible to local maxima or plateaus. We present a novel algorithm for learning MLN structure that follows a more bottom-up approach to address this problem. Our algorithm uses a ``propositional'' Markov network learning method to construct ``template'' networks that guide the construction of candidate clauses. Our algorithm significantly improves accuracy and learning time over the existing top-down approach in three real-world domains.
    ML ID: 202
  96. Multiple Instance Learning for Sparse Positive Bags
    [Details] [PDF]
    Razvan C. Bunescu and Raymond J. Mooney
    In Proceedings of the 24th Annual International Conference on Machine Learning (ICML-2007), Corvallis, OR, June 2007.
    We present a new approach to multiple instance learning (MIL) that is particularly effective when the positive bags are sparse (i.e. contain few positive instances). Unlike other SVM-based MIL methods, our approach more directly enforces the desired constraint that at least one of the instances in a positive bag is positive. Using both artificial and real-world data, we experimentally demonstrate that our approach achieves greater accuracy than state-of-the-art MIL methods when positive bags are sparse, and performs competitively when they are not. In particular, our approach is the best performing method for image region classification.
    ML ID: 201
  97. Learning Language Semantics from Ambiguous Supervision
    [Details] [PDF]
    Rohit J. Kate and Raymond J. Mooney
    In Proceedings of the 22nd Conference on Artificial Intelligence (AAAI-07), 895-900, Vancouver, Canada, July 2007.
    This paper presents a method for learning a semantic parser from ambiguous supervision. Training data consists of natural language sentences annotated with multiple potential meaning representations, only one of which is correct. Such ambiguous supervision models the type of supervision that can be more naturally available to language-learning systems. Given such weak supervision, our approach produces a semantic parser that maps sentences into meaning representations. An existing semantic parsing learning system that can only learn from unambiguous supervision is augmented to handle ambiguous supervision. Experimental results show that the resulting system is able to cope up with ambiguities and learn accurate semantic parsers.
    ML ID: 200
  98. Learning Synchronous Grammars for Semantic Parsing with Lambda Calculus
    [Details] [PDF]
    Yuk Wah Wong and Raymond J. Mooney
    In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL-2007), Prague, Czech Republic, June 2007.
    This paper presents the first empirical results to our knowledge on learning synchronous grammars that generate logical forms. Using statistical machine translation techniques, a semantic parser based on a synchronous context-free grammar augmented with lambda-operators is learned given a set of training sentences and their correct logical forms. The resulting parser is shown to be the best-performing system so far in a database query domain.
    ML ID: 199
  99. Semi-Supervised Learning for Semantic Parsing using Support Vector Machines
    [Details] [PDF] [Slides]
    Rohit J. Kate and Raymond J. Mooney
    In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Short Papers (NAACL/HLT-2007), 81--84, Rochester, NY, April 2007.
    We present a method for utilizing unannotated sentences to improve a semantic parser which maps natural language (NL) sentences into their formal meaning representations (MRs). Given NL sentences annotated with their MRs, the initial supervised semantic parser learns the mapping by training Support Vector Machine (SVM) classifiers for every production in the MR grammar. Our new method applies the learned semantic parser to the unannotated sentences and collects unlabeled examples which are then used to retrain the classifiers using a variant of transductive SVMs. Experimental results show the improvements obtained over the purely supervised parser, particularly when the annotated training set is small.
    ML ID: 198
  100. Generation by Inverting a Semantic Parser That Uses Statistical Machine Translation
    [Details] [PDF]
    Yuk Wah Wong and Raymond J. Mooney
    In Proceedings of Human Language Technologies: The Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT-07), 172-179, Rochester, NY, 2007.
    This paper explores the use of statistical machine translation (SMT) methods for tactical natural language generation. We present results on using phrase-based SMT for learning to map meaning representations to natural language. Improved results are obtained by inverting a semantic parser that uses SMT methods to map sentences into meaning representations. Finally, we show that hybridizing these two approaches results in still more accurate generation systems. Automatic and human evaluation of generated sentences are presented across two domains and four languages.
    ML ID: 197
  101. Learning for Semantic Parsing
    [Details] [PDF]
    Raymond J. Mooney
    In A. Gelbukh, editors, Computational Linguistics and Intelligent Text Processing: Proceedings of the 8th International Conference (CICLing 2007), 311--324, Mexico City, Mexico, February 2007. Springer: Berlin, Germany. Invited paper.
    Semantic parsing is the task of mapping a natural language sentence into a complete, formal meaning representation. Over the past decade, we have developed a number of machine learning methods for inducing semantic parsers by training on a corpus of sentences paired with their meaning representations in a specified formal language. We have demonstrated these methods on the automated construction of natural-language interfaces to databases and robot command languages. This paper reviews our prior work on this topic and discusses directions for future research.
    ML ID: 196
  102. Extracting Relations from Text: From Word Sequences to Dependency Paths
    [Details] [PDF]
    Razvan C. Bunescu and Raymond J. Mooney
    In A. Kao and S. Poteet, editors, Natural Language Processing and Text Mining, 29-44, Berlin, 2007. Springer Verlag.
    ML ID: 186
  103. Statistical Relational Learning for Natural Language Information Extraction
    [Details] [PDF]
    Razvan Bunescu and Raymond J. Mooney
    In L. Getoor and B. Taskar, editors, Introduction to Statistical Relational Learning, 535-552, Cambridge, MA, 2007. MIT Press.
    Understanding natural language presents many challenging problems that lend themselves to statistical relational learning (SRL). Historically, both logical and probabilistic methods have found wide application in natural language processing (NLP). NLP inevitably involves reasoning about an arbitrary number of entities (people, places, and things) that have an unbounded set of complex relationships between them. Representing and reasoning about unbounded sets of entities and relations has generally been considered a strength of predicate logic. However, NLP also requires integrating uncertain evidence from a variety of sources in order to resolve numerous syntactic and semantic ambiguities. Effectively integrating multiple sources of uncertain evidence has generally been considered a strength of Bayesian probabilistic methods and graphical models. Consequently, NLP problems are particularly suited for SRL methods that combine the strengths of first-order predicate logic and probabilistic graphical models. In this article, we review our recent work on using Relational Markov Networks (RMNs) for information extraction, the problem of identifying phrases in natural language text that refer to specific types of entities. We use the expressive power of RMNs to represent and reason about several specific relationships between candidate entities and thereby collectively identify the appropriate set of phrases to extract. We present experiments on learning to extract protein names from biomedical text, which demonstrate the advantage of this approach over existing IE methods.
    ML ID: 165
  104. Adaptive Blocking: Learning to Scale Up Record Linkage
    [Details] [PDF]
    Mikhail Bilenko, Beena Kamath, Raymond J. Mooney
    In Proceedings of the Sixth IEEE International Conference on Data Mining (ICDM-06), 87--96, Hong Kong, December 2006.
    Many data mining tasks require computing similarity between pairs of objects. Pairwise similarity computations are particularly important in record linkage systems, as well as in clustering and schema mapping algorithms. Because the number of object pairs grows quadratically with the size of the dataset, computing similarity between all pairs is impractical and becomes prohibitive for large datasets and complex similarity functions. Blocking methods alleviate this problem by efficiently selecting approximately similar object pairs for subsequent distance computations, leaving out the remaining pairs as dissimilar. Previously proposed blocking methods require manually constructing an indexbased similarity function or selecting a set of predicates, followed by hand-tuning of parameters. In this paper, we introduce an adaptive framework for automatically learning blocking functions that are efficient and accurate. We describe two predicate-based formulations of learnable blocking functions and provide learning algorithms for training them. The effectiveness of the proposed techniques is demonstrated on real and simulated datasets, on which they prove to be more accurate than non-adaptive blocking methods.
    ML ID: 195
  105. Fast and Effective Worm Fingerprinting via Machine Learning
    [Details] [PDF]
    Stewart Yang, Jianping Song, Harish Rajamani, Taewon Cho, Yin Zhang and Raymond Mooney
    In Proceedings of the 3rd IEEE International Conference on Autonomic Computing (ICAC-2006), Dublin, Ireland, June 2006. Poster Session.
    As Internet worms become ever faster and more sophisticated, it is important to be able to extract worm signatures in an accurate and timely manner. In this paper, we apply machine learning to automatically fingerprint polymorphic worms, which are able to change their appearance across every instance. Using real Internet traces and synthetic polymorphic worms, we evaluated the performance of several advanced machine learning algorithms, including naive Bayes, decision-tree induction, rule learning (RIPPER), and support vector machines. The results are very promising. Compared with Polygraph, the state of the art in polymorphic worm fingerprinting, several machine learning algorithms are able to generate more accurate signatures, tolerate more noise in the training data, and require much shorter training time. These results open the possibility of applying machine learning to build a fast and accurate online worm fingerprinting system.
    ML ID: 194
  106. Learnable Similarity Functions and Their Application to Record Linkage and Clustering
    [Details] [PDF]
    Mikhail Bilenko
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, August 2006. 136 pages.
    Many machine learning and data mining tasks depend on functions that estimate similarity between instances. Similarity computations are particularly important in clustering and information integration applications, where pairwise distances play a central role in many algorithms. Typically, algorithms for these tasks rely on pre-defined similarity measures, such as edit distance or cosine similarity for strings, or Euclidean distance for vector-space data. However, standard distance functions are frequently suboptimal as they do not capture the appropriate notion of similarity for a particular domain, dataset, or application.

    In this thesis, we present several approaches for addressing this problem by employing learnable similarity functions. Given supervision in the form of similar or dissimilar pairs of instances, learnable similarity functions can be trained to provide accurate estimates for the domain and task at hand. We study the problem of adapting similarity functions in the context of several tasks: record linkage, clustering, and blocking. For each of these tasks, we present learnable similarity functions and training algorithms that lead to improved performance.

    In record linkage, also known as duplicate detection and entity matching, the goal is to identify database records referring to the same underlying entity. This requires estimating similarity between corresponding field values of records, as well as overall similarity between records. For computing field-level similarity between strings, we describe two learnable variants of edit distance that lead to improvements in linkage accuracy. For learning record-level similarity functions, we employ Support Vector Machines to combine similarities of individual record fields in proportion to their relative importance, yielding a high-accuracy linkage system. We also investigate strategies for efficient collection of training data which can be scarce due to the pairwise nature of the record linkage task.

    In clustering, similarity functions are essential as they determine the grouping of instances that is the goal of clustering. We describe a framework for integrating learnable similarity functions within a probabilistic model for semi-supervised clustering based on Hidden Markov Random Fields (HMRFs). The framework accommodates learning various distance measures, including those based on Bregman divergences (e.g., parameterized Mahalanobis distance and parameterized KL-divergence), as well as directional measures (e.g., cosine similarity). Thus, it is applicable to a wide range of domains and data representations. Similarity functions are learned within the HMRF-KMeans algorithm derived from the framework, leading to significant improvements in clustering accuracy.

    The third application we consider, blocking, is critical in making record linkage and clustering algorithms scalable to large datasets, as it facilitates efficient selection of approximately similar instance pairs without explicitly considering all possible pairs. Previously proposed blocking methods require manually constructing a similarity function or a set of similarity predicates, followed by hand-tuning of parameters. We propose learning blocking functions automatically from linkage and semi-supervised clustering supervision, which allows automatic construction of blocking methods that are efficient and accurate. This approach yields computationally cheap learnable similarity functions that can be used for scaling up in a variety of tasks that rely on pairwise distance computations, including record linkage and clustering.

    ML ID: 193
  107. Learning Language from Perceptual Context: A Challenge Problem for AI
    [Details] [PDF]
    Raymond J. Mooney
    In Proceedings of the 2006 AAAI Fellows Symposium, Boston, MA, July 2006.
    We present the problem of learning to understand natural language from examples of utterances paired only with their relevant real-world context as an important challenge problem for AI. Machine learning has been adopted as the most effective way of developing natural-language processing systems; however, currently, complex annotated corpora are required for training. By learning language from perceptual context, the need for laborious annotation is removed and the system's resulting understanding is grounded in its perceptual experience.
    ML ID: 192
  108. Using String-Kernels for Learning Semantic Parsers
    [Details] [PDF] [Slides]
    Rohit J. Kate and Raymond J. Mooney
    In ACL 2006: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL, 913-920, Morristown, NJ, USA, 2006. Association for Computational Linguistics.
    We present a new approach for mapping natural language sentences to their formal meaning representations using string-kernel-based classifiers. Our system learns these classifiers for every production in the formal language grammar. Meaning representations for novel natural language sentences are obtained by finding the most probable semantic parse using these string classifiers. Our experiments on two real-world data sets show that this approach compares favorably to other existing systems and is particularly robust to noise.
    ML ID: 191
  109. Discriminative Reranking for Semantic Parsing
    [Details] [PDF]
    Ruifang Ge and Raymond J. Mooney
    In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL-06), Sydney, Australia, July 2006.
    Semantic parsing is the task of mapping natural language sentences to complete formal meaning representations. The performance of semantic parsing can be potentially improved by using discriminative reranking, which explores arbitrary global features. In this paper, we investigate discriminative reranking upon a baseline semantic parser, SCISSOR, where the composition of meaning representations is guided by syntax. We examine if features used for syntactic parsing can be adapted for semantic parsing by creating similar semantic features based on the mapping between syntax and semantics. We report experimental results on two real applications, an interpreter for coaching instructions in robotic soccer and a natural-language database interface. The results show that reranking can improve the performance on the coaching interpreter, but not on the database interface.
    ML ID: 190
  110. Transfer Learning with Markov Logic Networks
    [Details] [PDF]
    Lilyana Mihalkova and Raymond Mooney
    In Proceedings of the ICML-06 Workshop on Structural Knowledge Transfer for Machine Learning, Pittsburgh, PA, June 2006.
    We propose a new algorithm for transfer learning of Markov Logic Network (MLN) structure. An important aspect of our approach is that it first diagnoses the provided source MLN and then focuses on re-learning only the incorrect portions. Experiments in a pair of synthetic domains demonstrate that this strategy significantly decreases the search space and speeds up learning while maintaining a level of accuracy comparable to that of the current best algorithm.
    ML ID: 189
  111. Integrating Co-occurrence Statistics with Information Extraction for Robust Retrieval of Protein Interactions from Medline
    [Details] [PDF]
    Razvan Bunescu, Raymond Mooney, Arun Ramani and Edward Marcotte
    In Proceedings of the HLT-NAACL Workshop on Linking Natural Language Processing and Biology (BioNLP'06), 49-56, New York, NY, June 2006.
    The task of mining relations from collections of documents is usually approached in two different ways. One type of systems do relation extraction from individual sentences, followed by an aggregation of the results over the entire collection. Other systems follow an entirely different approach, in which co-occurrence counts are used to determine whether the mentioning together of two entities is due to more than simple chance. We show that increased extraction performance can be obtained by combining the two approaches into an integrated relation extraction model.
    ML ID: 188
  112. Learning for Semantic Parsing with Statistical Machine Translation
    [Details] [PDF]
    Yuk Wah Wong and Raymond J. Mooney
    In Proceedings of Human Language Technology Conference / North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT-NAACL-06), 439-446, New York City, NY, 2006.
    We present a novel statistical approach to semantic parsing, WASP, for constructing a complete, formal meaning representation of a sentence. A semantic parser is learned given a set of sentences annotated with their correct meaning representations. The main innovation of WASP is its use of state-of-the-art statistical machine translation techniques. A word alignment model is used for lexical acquisition, and the parsing model itself can be seen as a syntax-based translation model. We show that WASP performs favorably in terms of both accuracy and coverage compared to existing learning methods requiring similar amount of supervision, and shows better robustness to variations in task complexity and word order.
    ML ID: 187
  113. Using Encyclopedic Knowledge for Named Entity Disambiguation
    [Details] [PDF]
    Razvan Bunescu and Marius Pasca
    In Proceesings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-06), 9-16, Trento, Italy, 2006.
    We present a new method for detecting and disambiguating named entities in open domain text. A disambiguation SVM kernel is trained to exploit the high coverage and rich structure of the knowledge encoded in an online encyclopedia. The resulting model significantly outperforms a less informed baseline.
    ML ID: 185
  114. Learning Semantic Parsers Using Statistical Syntactic Parsing Techniques
    [Details] [PDF]
    Ruifang Ge
    2006. Doctoral Dissertation Proposal, University of Texas at Austin" , year="2006.
    Most recent work on semantic analysis of natural language has focused on ``shallow'' semantics such as word-sense disambiguation and semantic role labeling. Our work addresses a more ambitious task we call semantic parsing where natural language sentences are mapped to complete formal meaning representations. We present our system Scissor based on a statistical parser that generates a semantically-augmented parse tree (SAPT), in which each internal node has both a syntactic and semantic label. A compositional-semantics procedure is then used to map the augmented parse tree into a final meaning representation. Training the system requires sentences annotated with augmented parse trees. We evaluate the system in two domains, a natural-language database interface and an interpreter for coaching instructions in robotic soccer. We present experimental results demonstrating that Scissor produces more accurate semantic representations than several previous approaches on long sentences.
    In the future, we intend to pursue several directions in developing more accurate semantic parsing algorithms and automating the annotation process. This work will involve exploring alternative tree representations for better generalization in parsing. We also plan to apply discriminative reranking methods to semantic parsing, which allows exploring arbitrary, potentially correlated features not usable by the baseline learner. We also propose to design a method for automating the SAPT-generation process to alleviate the extra annotation work currently required for training Scissor. Finally, we will investigate the impact of different statistical syntactic parsers on semantic parsing using the automated SAPT-generation process.
    ML ID: 184
  115. Fast and Effective Worm Fingerprinting via Machine Learning
    [Details] [PDF]
    Stewart Yang, Jianping Song, Harish Rajamani, Taewon Cho, Yin Zhang and Raymond Mooney
    Technical Report AI-06-335, Artificial Intelligence Lab, The University of Texas at Austin, August 2006. This is a longer version of our ICAC-2006 paper.
    As Internet worms become ever faster and more sophisticated, it is important to be able to extract worm signatures in an accurate and timely manner. In this paper, we apply machine learning to automatically fingerprint polymorphic worms, which are able to change their appearance across every instance. Using real Internet traces and synthetic polymorphic worms, we evaluated the performance of several advanced machine learning algorithms, including naive Bayes, decision-tree induction, rule learning, and support vector machines. The results are very promising. Compared with Polygraph, the state of the art in polymorphic worm fingerprinting, several machine learning algorithms are able to generate more accurate signatures, tolerate more noise in the training data, and require much shorter training time. These results open the possibility of applying machine learning to build a fast and accurate online worm fingerprinting system.
    ML ID: 183
  116. Probabilistic Semi-Supervised Clustering with Constraints
    [Details] [PDF]
    Sugato Basu, Mikhail Bilenko, Arindam Banerjee and Raymond J. Mooney
    In O. Chapelle and B. Sch{"{o}}lkopf and A. Zien, editors, Semi-Supervised Learning, Cambridge, MA, 2006. MIT Press.
    In certain clustering tasks it is possible to obtain limited supervision in the form of pairwise constraints, i.e., pairs of instances labeled as belonging to same or different clusters. The resulting problem is known as semi-supervised clustering, an instance of semi-supervised learning stemming from a traditional unsupervised learning setting. Several algorithms exist for enhancing clustering quality by using supervision in the form of constraints. These algorithms typically utilize the pairwise constraints to either modify the clustering objective function or to learn the clustering distortion measure. This chapter describes an approach that employs Hidden Markov Random Fields (HMRFs) as a probabilistic generative model for semi-supervised clustering, thereby providing a principled framework for incorporating constraint-based supervision into prototype-based clustering. The HMRF-based model allows the use of a broad range of clustering distortion measures, including Bregman divergences (e.g., squared Euclidean distance, KL divergence) and directional distance measures (e.g., cosine distance), making it applicable to a number of domains. The model leads to the HMRF-KMeans algorithm which minimizes an objective function derived from the joint probability of the model, and allows unification of constraint-based and distance-based semi-supervised clustering methods. Additionally, a two-phase active learning algorithm for selecting informative pairwise constraints in a query-driven framework is derived from the HMRF model, facilitating improved clustering performance with relatively small amounts of supervision from the user.
    ML ID: 176
  117. Subsequence Kernels for Relation Extraction
    [Details] [PDF]
    Razvan Bunescu and Raymond J. Mooney
    In Submitted to the Ninth Conference on Natural Language Learning (CoNLL-2005), Ann Arbor, MI, July 2006. Available at url{http://www.cs.utexas.edu/users/ml/publication/ie.html}.
    We present a new kernel method for extracting semantic relations between entities in natural language text, based on a generalization of subsequence kernels. This kernel uses three types of subsequence patterns that are typically employed in natural language to assert relationships between two entities. Experiments on extracting protein interactions from biomedical corpora and top-level relations from newspaper corpora demonstrate the advantages of this approach.
    ML ID: 169
  118. Using Active Relocation to Aid Reinforcement Learning
    [Details] [PDF]
    Lilyana Mihalkova and Raymond Mooney
    In Prodeedings of the 19th International FLAIRS Conference (FLAIRS-2006), 580-585, Melbourne Beach, FL, May 2006.
    We propose a new framework for aiding a reinforcement learner by allowing it to relocate, or move, to a state it selects so as to decrease the number of steps it needs to take in order to develop an effective policy. The framework requires a minimal amount of human involvement or expertise and assumes a cost for each relocation. Several methods for taking advantage of the ability to relocate are proposed, and their effectiveness is tested in two commonly-used domains.
    ML ID: 166
  119. Creating Diverse Ensemble Classifiers to Reduce Supervision
    [Details] [PDF]
    Prem Melville
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, November 2005. 141 pages. Technical Report TR-05-49.
    Ensemble methods like Bagging and Boosting which combine the decisions of multiple hypotheses are some of the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its generalization error. In this thesis, we present a new method for generating ensembles, DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples), that directly constructs diverse hypotheses using additional artificially-generated training examples. The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. The diverse ensembles produced by DECORATE are very effective for reducing the amount of supervision required for building accurate models. The first task we demonstrate this on is classification given a fixed training set. Experimental results using decision-tree induction as a base learner demonstrate that our approach consistently achieves higher predictive accuracy than the base classifier, Bagging and Random Forests. Also, DECORATE attains higher accuracy than Boosting on small training sets, and achieves comparable performance on larger training sets. Additional experiments demonstrate DECORATE's resilience to imperfections in data, in the form of missing features, classification noise, and feature noise.
    DECORATE ensembles can also be used to reduce supervision through active learning, in which the learner selects the most informative examples from a pool of unlabeled examples, such that acquiring their labels will increase the accuracy of the classifier. Query by Committee is one effective approach to active learning in which disagreement within the ensemble of hypotheses is used to select examples for labeling. Query by Bagging and Query by Boosting are two practical implementations of this approach that use Bagging and Boosting respectively, to build the committees. For efficient active learning it is critical that the committee be made up of consistent hypotheses that are very different from each other. Since DECORATE explicitly builds such committees, it is well-suited for this task. We introduce a new algorithm, Active-DECORATE, which uses DECORATE committees to select good training examples. Experimental results demonstrate that Active-DECORATE typically requires labeling fewer examples to achieve the same accuracy as Query by Bagging and Query by Boosting. Apart from optimizing classification accuracy, in many applications, producing good class probability estimates is also important, e.g., in fraud detection, which has unequal misclassification costs. This thesis introduces a novel approach to active learning based on Active-DECORATE which uses Jensen-Shannon divergence (a similarity measure for probability distributions) to improve the selection of training examples for optimizing probability estimation. Comprehensive experimental results demonstrate the benefits of our approach.
    Unlike the active learning setting, in many learning problems the class labels for all instances are known, but feature values may be missing and can be acquired at a cost. For building accurate predictive models, acquiring complete information for all instances is often quite expensive, while acquiring information for a random subset of instances may not be optimal. We formalize the task of active feature-value acquisition, which tries to reduce the cost of achieving a desired model accuracy by identifying instances for which obtaining complete information is most informative. We present an approach, based on DECORATE, in which instances are selected for acquisition based on the current model's accuracy and its confidence in the prediction. Experimental results demonstrate that our approach can induce accurate models using substantially fewer feature-value acquisitions than random sampling.
    ML ID: 182
  120. A Kernel-based Approach to Learning Semantic Parsers
    [Details] [PDF] [Slides]
    Rohit J. Kate
    2005. Doctoral Dissertation Proposal, University of Texas at Austin.
    Semantic parsing involves deep semantic analysis that maps natural language sentences to their formal executable meaning representations. This is a challenging problem and is critical for developing user-friendly natural language interfaces to computing systems. Most of the research in natural language understanding, however, has mainly focused on shallow semantic analysis like case-role analysis or word sense disambiguation. The existing work in semantic parsing either lack the robustness of statistical methods or are applicable only to simple domains where semantic analysis is equivalent to filling a single semantic frame.

    In this proposal, we present a new approach to semantic parsing based on string-kernel-based classification. Our system takes natural language sentences paired with their formal meaning representations as training data. For every production in the formal language grammar, a Support-Vector Machine (SVM) classifier is trained using string similarity as the kernel. Each classifier then gives the probability of the production covering any given natural language string of words. These classifiers are further refined using EM-type iterations based on their performance on the training data. Meaning representations for novel natural language sentences are obtained by finding the most probable semantic parse using these classifiers. Our experiments on two real-world data sets that have deep meaning representations show that this approach compares favorably to other existing systems in terms of accuracy and coverage.

    For future work, we propose to extend this approach so that it will also exploit the knowledge of natural language syntax by using the existing syntactic parsers. We also intend to broaden the scope of application domains, for example, domains where the sentences are noisy as typical in speech, or domains where corpora available for training do not have natural language sentences aligned with their unique meaning representations. We aim to test our system on the task of complex relation extraction as well. Finally, we also plan to investigate ways to combine our semantic parser with some recently developed semantic parsers to form committees in order to get the best overall performance.
    ML ID: 181
  121. Learning for Semantic Parsing Using Statistical Machine Translation Techniques
    [Details] [PDF]
    Yuk Wah Wong
    2005. Doctoral Dissertation Proposal, University of Texas at Austin.
    Semantic parsing is the construction of a complete, formal, symbolic meaning representation of a sentence. While it is crucial to natural language understanding, the problem of semantic parsing has received relatively little attention from the machine learning community. Recent work on natural language understanding has mainly focused on shallow semantic analysis, such as word- sense disambiguation and semantic role labeling. Semantic parsing, on the other hand, involves deep semantic analysis in which word senses, semantic roles and other components are combined to produce useful meaning representations for a particular application domain (e.g. database query). Prior research in machine learning for semantic parsing is mainly based on inductive logic programming or deterministic parsing, which lack some of the robustness that characterizes statistical learning. Existing statistical approaches to semantic parsing, however, are mostly concerned with relatively simple application domains in which a meaning representation is no more than a single semantic frame.

    In this proposal, we present a novel statistical approach to semantic parsing, WASP, which can handle meaning representations with a nested structure. The WASP algorithm learns a semantic parser given a set of sentences annotated with their correct meaning representations. The parsing model is based on the synchronous context-free grammar, where each rule maps a natural-language substring to its meaning representation. The main innovation of the algorithm is its use of state-of-the-art statistical machine translation techniques. A statistical word alignment model is used for lexical acquisition, and the parsing model itself can be seen as an instance of a syntax-based translation model. In initial evaluation on several real-world data sets, we show that WASP performs favorably in terms of both accuracy and coverage compared to existing learning methods requiring similar amount of supervision, and shows better robustness to variations in task complexity and word order.

    In future work, we intend to pursue several directions in developing accurate semantic parsers for a variety of application domains. This will involve exploiting prior knowledge about the natural-language syntax and the application domain. We also plan to construct a syntax-aware word-based alignment model for lexical acquisition. Finally, we will generalize the learning algorithm to handle context-dependent sentences and accept noisy training data.
    ML ID: 180
  122. An Expected Utility Approach to Active Feature-value Acquisition
    [Details] [PDF]
    P. Melville, M. Saar-Tsechansky, F. Provost and Raymond J. Mooney
    In Proceedings of the International Conference on Data Mining, 745-748, Houston, TX, November 2005.
    In many classification tasks training data have missing feature values that can be acquired at a cost. For building accurate predictive models, acquiring all missing values is often prohibitively expensive or unnecessary, while acquiring a random subset of feature values may not be most effective. The goal of active feature-value acquisition is to incrementally select feature values that are most cost-effective for improving the model's accuracy. We present an approach that acquires feature values for inducing a classification model based on an estimation of the expected improvement in model accuracy per unit cost. Experimental results demonstrate that our approach consistently reduces the cost of producing a model of a desired accuracy compared to random feature acquisitions.
    ML ID: 179
  123. Combining Bias and Variance Reduction Techniques for Regression
    [Details] [PDF]
    Yuk Lai Suen, Prem Melville and Raymond J. Mooney
    Technical Report UT-AI-TR-05-321, University of Texas at Austin, July 2005. www.cs.utexas.edu/~ml/publication.
    Gradient Boosting and bagging applied to regressors can reduce the error due to bias and variance respectively. Alternatively, Stochastic Gradient Boosting (SGB) and Iterated Bagging (IB) attempt to simultaneously reduce the contribution of both bias and variance to error. We provide an extensive empirical analysis of these methods, along with two alternate bias-variance reduction approaches --- bagging Gradient Boosting (BagGB) and bagging Stochastic Gradient Boosting (BagSGB). Experimental results demonstrate that SGB does not perform as well as IB or the alternate approaches. Furthermore, results show that, while BagGB and BagSGB perform competitively for low-bias learners, in general, Iterated Bagging is the most effective of these methods.
    ML ID: 178
  124. Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping
    [Details] [PDF]
    Mikhail Bilenko, Sugato Basu, and Mehran Sahami
    In Proceedings of the 5th International Conference on Data Mining (ICDM-2005), 58--65, Houston, TX, November 2005.
    The problem of record linkage focuses on determining whether two object descriptions refer to the same underlying entity. Addressing this problem effectively has many practical applications, e.g., elimination of duplicate records in databases and citation matching for scholarly articles. In this paper, we consider a new domain where the record linkage problem is manifested: Internet comparison shopping. We address the resulting linkage setting that requires learning a similarity function between record pairs from streaming data. The learned similarity function is subsequently used in clustering to determine which records are co-referent and should be linked. We present an online machine learning method for addressing this problem, where a composite similarity function based on a linear combination of basis functions is learned incrementally. We illustrate the efficacy of this approach on several real-world datasets from an Internet comparison shopping site, and show that our method is able to effectively learn various distance functions for product data with differing characteristics. We also provide experimental results that show the importance of considering multiple performance measures in record linkage evaluation.
    ML ID: 178
  125. Alignments and String Similarity in Information Integration: A Random Field Approach
    [Details] [PDF]
    Mikhail Bilenko and Raymond J. Mooney
    In Proceedings of the 2005 Dagstuhl Seminar on Machine Learning for the Semantic Web, Dagstuhl, Germany, February 2005.
    Several problems central to information integration, such as ontology mapping and object matching, can be viewed as alignment tasks where the goal is to find an optimal correspondence between two structured objects and to compute the associated similarity score. The diversity of data sources and domains in the Semantic Web requires solutions to these problems to be highly adaptive, which can be achieved by employing probabilistic machine learning approaches. We present one such approach, Alignment Conditional Random Fields (ACRFs), a new framework for constructing and scoring sequence alignments using undirected graphical models. ACRFs allow incorporating arbitrary features into string edit distance computation, yielding a learnable string similarity function for use in tasks where approximate string matching is needed. We outline possible applications of ACRFs in information integration tasks and describe directions for future work.
    ML ID: 177
  126. A Shortest Path Dependency Kernel for Relation Extraction
    [Details] [PDF]
    R. C. Bunescu, and Raymond J. Mooney
    In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP-05), 724-731, Vancouver, BC, October 2005.
    We present a novel approach to relation extraction, based on the observation that the information required to assert a relationship between two named entities in the same sentence is typically captured by the shortest path between the two entities in the dependency graph. Experiments on extracting top-level relations from the ACE (Automated Content Extraction) newspaper corpus show that the new shortest path dependency kernel outperforms a recent approach based on dependency tree kernels.
    ML ID: 175
  127. Semi-supervised Clustering: Probabilistic Models, Algorithms and Experiments
    [Details] [PDF]
    Sugato Basu
    PhD Thesis, University of Texas at Austin, 2005.
    Clustering is one of the most common data mining tasks, used frequently for data categorization and analysis in both industry and academia. The focus of our research is on semi-supervised clustering, where we study how prior knowledge, gathered either from automated information sources or human supervision, can be incorporated into clustering algorithms. In this thesis, we present probabilistic models for semi-supervised clustering, develop algorithms based on these models and empirically validate their performances by extensive experiments on data sets from different domains, e.g., text analysis, hand-written character recognition, and bioinformatics.

    In many domains where clustering is applied, some prior knowledge is available either in the form of labeled data (specifying the category to which an instance belongs) or pairwise constraints on some of the instances (specifying whether two instances should be in same or different clusters). In this thesis, we first analyze effective methods of incorporating labeled supervision into prototype-based clustering algorithms, and propose two variants of the well-known KMeans algorithm that can improve their performance with limited labeled data.

    We then focus on the problem of semi-supervised clustering with constraints and show how this problem can be studied in the framework of a well-defined probabilistic generative model of a Hidden Markov Random Field. We derive an efficient KMeans-type iterative algorithm, HMRF-KMeans, for optimizing a semi-supervised clustering objective function defined on the HMRF model. We also give convergence guarantees of our algorithm for a large class of clustering distortion measures (e.g., squared Euclidean distance, KL divergence, and cosine distance).

    Finally, we develop an active learning algorithm for acquiring maximally informative pairwise constraints in an interactive query-driven framework, which to our knowledge is the first active learning algorithm for semi-supervised clustering with constraints.

    Other interesting problems of semi-supervised clustering that we discuss in this thesis include (1) semi-supervised graph-based clustering using kernels, (2) using prior knowledge to improve overlapping clustering of data, (3) integration of both constraint-based and distance-based semi-supervised clustering methods using the HMRF model, and (4) model selection techniques that use the available supervision to automatically select the right number of clusters.
    ML ID: 174
  128. Combining Bias and Variance Reduction Techniques for Regression
    [Details] [PDF]
    Y. L. Suen, P. Melville and Raymond J. Mooney
    In Proceedings of the 16th European Conference on Machine Learning, 741-749, Porto, Portugal, October 2005.
    Gradient Boosting and bagging applied to regressors can reduce the error due to bias and variance respectively. Alternatively, Stochastic Gradient Boosting (SGB) and Iterated Bagging (IB) attempt to simultaneously reduce the contribution of both bias and variance to error. We provide an extensive empirical analysis of these methods, along with two alternate bias-variance reduction approaches --- bagging Gradient Boosting (BagGB) and bagging Stochastic Gradient Boosting (BagSGB). Experimental results demonstrate that SGB does not perform as well as IB or the alternate approaches. Furthermore, results show that, while BagGB and BagSGB perform competitively for low-bias learners, in general, Iterated Bagging is the most effective of these methods.
    ML ID: 173
  129. Consolidating the Set of Known Human Protein-Protein Interactions in Preparation for Large-Scale Mapping of the Human Interactome
    [Details] [PDF]
    A.K. Ramani, R.C. Bunescu, Raymond J. Mooney and E.M. Marcotte
    Genome Biology, 6(5):r40, 2005.
    Background

    Extensive protein interaction maps are being constructed for yeast, worm, and fly to ask how the proteins organize into pathways and systems, but no such genome-wide interaction map yet exists for the set of human proteins. To prepare for studies in humans, we wished to establish tests for the accuracy of future interaction assays and to consolidate the known interactions among human proteins.

    Results

    We established two tests of the accuracy of human protein interaction datasets and measured the relative accuracy of the available data. We then developed and applied natural language processing and literature-mining algorithms to recover from Medline abstracts 6,580 interactions among 3,737 human proteins. A three-part algorithm was used: first, human protein names were identified in Medline abstracts using a discriminator based on conditional random fields, then interactions were identified by the co-occurrence of protein names across the set of Medline abstracts, filtering the interactions with a Bayesian classifier to enrich for legitimate physical interactions. These mined interactions were combined with existing interaction data to obtain a network of 31,609 interactions among 7,748 human proteins, accurate to the same degree as the existing datasets.

    Conclusion

    These interactions and the accuracy benchmarks will aid interpretation of current functional genomics data and provide a basis for determining the quality of future large-scale human protein interaction assays. Projecting from the approximately 15 interactions per protein in the best-sampled interaction set to the estimated 25,000 human genes implies more than 375,000 interactions in the complete human protein interaction network. This set therefore represents no more than 10% of the complete network.
    ML ID: 172
  130. A Statistical Semantic Parser that Integrates Syntax and Semantics
    [Details] [PDF]
    Ruifang Ge and Raymond J. Mooney
    In Proceedings of CoNLL-2005, Ann Arbor, Michigan, June 2005.
    We introduce a learning semantic parser, Scissor, that maps natural-language sentences to a detailed, formal, meaning-representation language. It first uses an integrated statistical parser to produce a semantically augmented parse tree, in which each non-terminal node has both a syntactic and a semantic label. A compositional-semantics procedure is then used to map the augmented parse tree into a final meaning representation. We evaluate the system in two domains, a natural-language database interface and an interpreter for coaching instructions in robotic soccer. We present experimental results demonstrating that Scissor produces more accurate semantic representations than several previous approaches.
    ML ID: 171
  131. Mining Knowledge from Text Using Information Extraction
    [Details] [PDF]
    Raymond J. Mooney and R. Bunescu
    SIGKDD Explorations (special issue on Text Mining and Natural Language Processing), 7(1):3-10, 2005.
    An important approach to text mining involves the use of natural-language information extraction. Information extraction (IE) distills structured data or knowledge from unstructured text by identifying references to named entities as well as stated relationships between such entities. IE systems can be used to directly extricate abstract knowledge from a text corpus, or to extract concrete data from a set of documents which can then be further analyzed with traditional data-mining techniques to discover more general patterns. We discuss methods and implemented systems for both of these approaches and summarize results on mining real text corpora of biomedical abstracts, job announcements, and product descriptions. We also discuss challenges that arise when employing current information extraction technology to discover knowledge in text.
    ML ID: 170
  132. Economical Active Feature-value Acquisition through Expected Utility Estimation
    [Details] [PDF]
    P. Melville, M. Saar-Tsechansky, F. Provost and Raymond J. Mooney
    In Proceedings of the KDD-05 Workshop on Utility-Based Data Mining, 10-16, Chicago, IL, August 2005.
    In many classification tasks training data have missing feature values that can be acquired at a cost. For building accurate predictive models, acquiring all missing values is often prohibitively expensive or unnecessary, while acquiring a random subset of feature values may not be most effective. The goal of Active Feature-value Acquisition is to incrementally select feature values that are most cost-effective for improving the model's accuracy. We present two policies, Sampled Expected Utility and Expected Utility-ES, that acquire feature values for inducing a classification model based on an estimation of the expected improvement in model accuracy per unit cost. A comparison of the two policies to each other and to alternative policies demonstrate that Sampled Expected Utility is preferable as it effectively reduces the cost of producing a model of a desired accuracy and exhibits a consistent performance across domains.
    ML ID: 168
  133. Semi-supervised Graph Clustering: A Kernel Approach
    [Details] [PDF]
    B. Kulis, S. Basu, I. Dhillon and Raymond J. Mooney
    In Proceedings of the 22nd International Conference on Machine Learning, 457--464, Bonn, Germany, August 2005. (Distinguished Student Paper Award).
    Semi-supervised clustering algorithms aim to improve clustering results using limited supervision. The supervision is generally given as pairwise constraints; such constraints are natural for graphs, yet most of the semi-supervised clustering algorithms are designed for data represented as vectors. In this paper, we unify vector-based and graph-based approaches. We show that a recently-proposed objective function for semi-supervised clustering based on Hidden Markov Random Fields, with squared Euclidean distance and a certain class of constraint penalty functions, can be expressed as a special case of the weighted kernel k-means objective. A recent theoretical connection between kernel k-means and several graph clustering objectives enables us to perform semi-supervised clustering of data given either as vectors or as a graph. For vector data, the kernel approach also enables us to find clusters with non-linear boundaries in the input data space. Furthermore, we show that recent work on spectral learning may be viewed as a special case of our formulation. We empirically show that our algorithm is able to outperform current state-of-the-art semi-supervised algorithms on both vector-based and graph-based data sets.
    ML ID: 167
  134. Using Biomedical Literature Mining to Consolidate the Set of Known Human Protein-Protein Interactions
    [Details] [PDF]
    A. Ramani, E. Marcotte, R. Bunescu and Raymond J. Mooney
    In Proceedings of the ISMB/ACL-05 Workshop of the BioLINK SIG: Linking Literature, Information and Knowledge for Biology, Detroit, MI, June 2005.
    This paper presents the results of a large-scale effort to construct a comprehensive database of known human protein interactions by combining and linking known interactions from existing databases and then adding to them by automatically mining additional interactions from 750,000 Medline abstracts. The end result is a network of 31,609 interactions amongst 7,748 proteins. The text mining system first identifies protein names in the text using a trained Conditional Random Field (CRF) and then identifies interactions through a filtered co-citation analysis. We also report two new strategies for mining interactions, either by finding explicit statements of interactions in the text using learned pattern-based rules or a Support-Vector Machine using a string kernel. Using information in existing ontologies, the automatically extracted data is shown to be of equivalent accuracy to manually curated data sets.
    ML ID: 164
  135. Model-based Overlapping Clustering
    [Details] [PDF]
    A. Banerjee, C. Krumpelman, S. Basu, Raymond J. Mooney and Joydeep Ghosh
    In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-05), 2005.
    While the vast majority of clustering algorithms are partitional, many real world datasets have inherently overlapping clusters. The recent explosion of analysis on biological datasets, which are frequently overlapping, has led to new clustering models that allow hard assignment of data points to multiple clusters. One particularly appealing model was proposed by Segal et al. in the context of probabilistic relational models (PRMs) applied to the analysis of gene microarray data. In this paper, we start with the basic approach of Segal et al. and provide an alternative interpretation of the model as a generalization of mixture models, which makes it easily interpretable. While the original model maximized likelihood over constant variance Gaussians, we generalize it to work with any regular exponential family distribution, and corresponding Bregman divergences, thereby making the model applicable for a wide variety of clustering distance functions, e.g., KL-divergence, Itakura-Saito distance, I-divergence. The general model is applicable to several domains, including high-dimensional sparse domains, such as text and recommender systems. We additionally offer several algorithmic modifications that improve both the performance and applicability of the model. We demonstrate the effectiveness of our algorithm through experiments on synthetic data as well as subsets of 20-Newsgroups and EachMovie datasets.
    ML ID: 163
  136. Towards Self-Configuring Hardware for Distributed Computer Systems
    [Details] [PDF]
    Jonathan Wildstrom, Peter Stone, E. Witchel, Raymond Mooney and M. Dahlin
    In The Second International Conference on Autonomic Computing, 241-249, June 2005.
    High-end servers that can be partitioned into logical subsystems and repartitioned on the fly are now becoming available. This development raises the possibility of reconfiguring distributed systems online to optimize for dynamically changing workloads. This paper presents the initial steps towards a system that can learn to alter its current configuration in reaction to the current workload. In particular, the advantages of shifting CPU and memory resources online are considered. Investigation on a publically available multi-machine, multi-process distributed system (the online transaction processing benchmark TPC-W) indicates that there is a real performance benefit to reconfiguration in reaction to workload changes. A learning framework is presented that does not require any instrumentation of the middleware, nor any special instrumentation of the operating system; rather, it learns to identify preferable configurations as well as their quantitative performance effects from system behavior as reported by standard monitoring tools. Initial results using the WEKA machine learning package suggest that automatic adaptive configuration can provide measurable performance benefits over any fixed configuration.
    ML ID: 162
  137. Active Learning for Probability Estimation using Jensen-Shannon Divergence
    [Details] [PDF]
    P. Melville, S. M. Yang, M. Saar-Tsechansky and Raymond J. Mooney
    In Proceedings of the 16th European Conference on Machine Learning, 268--279, Porto, Portugal, October 2005.
    Active selection of good training examples is an important approach to reducing data-collection costs in machine learning; however, most existing methods focus on maximizing classification accuracy. In many applications, such as those with unequal misclassification costs, producing good class probability estimates (CPEs) is more important than optimizing classification accuracy. We introduce novel variations of two extant active-learning algorithms, Boostrap-LV and ACTIVEDECORATE, by using Jensen-Shannon divergence (a similarity measure for probability distributions) to improve sample selection for optimizing CPEs. Comprehensive experimental results demonstrate the benefits of our enhancements.
    ML ID: 161
  138. Learning to Transform Natural to Formal Languages
    [Details] [PDF] [Slides]
    Rohit J. Kate, Yuk Wah Wong and Raymond J. Mooney
    In Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI-05), 1062-1068, Pittsburgh, PA, July 2005.
    This paper presents a method for inducing transformation rules that map natural-language sentences into a formal query or command language. The approach assumes a formal grammar for the target representation language and learns transformation rules that exploit the non-terminal symbols in this grammar. The learned transformation rules incrementally map a natural-language sentence or its syntactic parse tree into a parse-tree for the target formal language. Experimental results are presented for two corpora, one which maps English instructions into an existing formal coaching language for simulated RoboCup soccer agents, and another which maps English U.S.-geography questions into a database query language. We show that our method performs overall better and faster than previous approaches in both domains.
    ML ID: 160
  139. Explaining Recommendations: Satisfaction vs. Promotion
    [Details] [PDF]
    Mustafa Bilgic and Raymond J. Mooney
    In Proceedings of Beyond Personalization 2005: A Workshop on the Next Stage of Recommender Systems Research at the 2005 International Conference on Intelligent User Interfaces, San Diego, CA, January 2005.
    Recommender systems have become a popular technique for helping users select desirable books, movies, music and other items. Most research in the area has focused on developing and evaluating algorithms for efficiently producing accurate recommendations. However, the ability to effectively explain its recommendations to users is another important aspect of a recommender system. The only previous investigation of methods for explaining recommendations showed that certain styles of explanations were effective at convincing users to adopt recommendations (i.e. promotion) but failed to show that explanations actually helped users make more accurate decisions (i.e. satisfaction). We present two new methods for explaining recommendations of content-based and/or collaborative systems and experimentally show that they actually improve user's estimation of item quality.
    ML ID: 156
  140. Learning for Collective Information Extraction
    [Details] [PDF]
    Razvan C. Bunescu
    Technical Report TR-05-02, Department of Computer Sciences, University of Texas at Austin, October 2005. Ph.D. proposal.
    An Information Extraction (IE) system analyses a set of documents with the aim of identifying certain types of entities and relations between them. Most IE systems treat separate potential extractions as independent. However, in many cases, considering influences between different candidate extractions could improve overall accuracy. For example, phrase repetitions inside a document are usually associated with the same entity type, the same being true for acronyms and their corresponding long form. One of our goals in this thesis is to show how these and potentially other types of correlations can be captured by a particular type of undirected probabilistic graphical model. Inference and learning using this graphical model allows for collective information extraction in a way that exploits the mutual influence between possible extractions. Preliminary experiments on learning to extract named entities from biomedical and newspaper text demonstrate the advantages of our approach.
    The benefit of doing collective classification comes however at a cost: in the general case, exact inference in the resulting graphical model has an exponential time complexity. The standard solution, which is also the one that we used in our initial work, is to resort to approximate inference. In this proposal we show that by considering only a selected subset of mutual influences between candidate extractions, exact inference can be done in linear time. Consequently, a short term goal is to run comparative experiments that would help us choose between the two approaches: exact inference with a restricted subset of mutual influences or approximate inference with the full set of influences.
    The set of issues that we intend to investigate in future work is two fold. One direction refers to applying the already developed framework to other natural language tasks that may benefit from the same types of influences, such as word sense disambiguation and part-of-speech tagging. Another direction concerns the design of a sufficiently general framework that would allow a seamless integration of cues from a variety of knowledge sources. We contemplate using generic sources such as external dictionaries, or web statistics on discriminative textual patterns. We also intend to alleviate the modeling problems due to the intrinsic local nature of entity features by exploiting syntactic information. All these generic features will be input to a feature selection algorithm, so that in the end we obtain a model which is both compact and accurate.
    ML ID: 155
  141. Comparative Experiments on Learning Information Extractors for Proteins and their Interactions
    [Details] [PDF]
    Razvan Bunescu, Ruifang Ge, Rohit J. Kate, Edward M. Marcotte, Raymond J. Mooney, Arun Kumar Ramani, and Yuk Wah Wong
    Artificial Intelligence in Medicine (special issue on Summarization and Information Extraction from Medical Documents)(2):139-155, 2005.
    Automatically extracting information from biomedical text holds the promise of easily consolidating large amounts of biological knowledge in computer-accessible form. This strategy is particularly attractive for extracting data relevant to genes of the human genome from the 11 million abstracts in Medline. However, extraction efforts have been frustrated by the lack of conventions for describing human genes and proteins. We have developed and evaluated a variety of learned information extraction systems for identifying human protein names in Medline abstracts and subsequently extracting information on interactions between the proteins. We demonstrate that machine learning approaches using support vector machines and hidden Markov models are able to identify human proteins with higher accuracy than several previous approaches. We also demonstrate that various rule induction methods are able to identify protein interactions more accurately than manually-developed rules.
    ML ID: 137
  142. Active Feature-Value Acquisition for Classifier Induction
    [Details] [PDF]
    Prem Melville, Maytal Saar-Tsechansky, Foster Provost, and Raymond J. Mooney
    Technical Report UT-AI-TR-04-311, Artificial Intelligence Lab, University of Texas at Austin, February 2004.
    Many induction problems, such as on-line customer profiling, include missing data that can be acquired at a cost, such as incomplete customer information that can be filled in by an intermediary. For building accurate predictive models, acquiring complete information for all instances is often prohibitively expensive or unnecessary. Randomly selecting instances for feature acquisition allows a representative sampling, but does not incorporate other value estimations of acquisition. Active feature-value acquisition aims at reducing the cost of achieving a desired model accuracy by identifying instances for which complete information is most informative to obtain. We present approaches in which instances are selected for feature acquisition based on the current model's ability to predict accurately and the model's confidence in its prediction. Experimental results on several real-world data sets demonstrate that our approach can induce accurate models using substantially fewer feature-value acquisitions as compared to a baseline policy and a previously-published approach.
    ML ID: 158
  143. Active Feature-Value Acquisition for Classifier Induction
    [Details] [PDF]
    Prem Melville, Maytal Saar-Tsechansky, Foster Provost, and Raymond J. Mooney
    In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM-2004), 483-486, Brighton, UK, November 2004.
    Many induction problems, such as on-line customer profiling, include missing data that can be acquired at a cost, such as incomplete customer information that can be filled in by an intermediary. For building accurate predictive models, acquiring complete information for all instances is often prohibitively expensive or unnecessary. Randomly selecting instances for feature acquisition allows a representative sampling, but does not incorporate other value estimations of acquisition. Active feature-value acquisition aims at reducing the cost of achieving a desired model accuracy by identifying instances for which complete information is most informative to obtain. We present approaches in which instances are selected for feature acquisition based on the current model's ability to predict accurately and the model's confidence in its prediction. Experimental results on several real-world data sets demonstrate that our approach can induce accurate models using substantially fewer feature-value acquisitions as compared to a baseline policy and a previously-published approach.
    ML ID: 157
  144. A Probabilistic Framework for Semi-Supervised Clustering
    [Details] [PDF]
    Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney
    In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), 59-68, Seattle, WA, August 2004.
    Unsupervised clustering can be significantly improved using supervision in the form of pairwise constraints, i.e., pairs of instances labeled as belonging to same or different clusters. In recent years, a number of algorithms have been proposed for enhancing clustering quality by employing such supervision. Such methods use the constraints to either modify the objective function, or to learn the distance measure. We propose a probabilistic model for semi-supervised clustering based on Hidden Markov Random Fields (HMRFs) that provides a principled framework for incorporating supervision into prototype-based clustering. The model generalizes a previous approach that combines constraints and Euclidean distance learning, and allows the use of a broad range of clustering distortion measures, including Bregman divergences (e.g., Euclidean distance and I-divergence) and directional similarity measures (e.g., cosine similarity). We present an algorithm that performs partitional semi-supervised clustering of data by minimizing an objective function derived from the posterior energy of the HMRF model. Experimental results on several text data sets demonstrate the advantages of the proposed framework.
    ML ID: 154
  145. Text Mining with Information Extraction
    [Details] [PDF]
    Un Yong Nahm
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, August 2004. 217 pages. Also appears as Technical Report UT-AI-TR-04-311.
    The popularity of the Web and the large number of documents available in electronic form has motivated the search for hidden knowledge in text collections. Consequently, there is growing research interest in the general topic of text mining. In this dissertation, we develop a text-mining system by integrating methods from Information Extraction (IE) and Data Mining (Knowledge Discovery from Databases or KDD). By utilizing existing IE and KDD techniques, text-mining systems can be developed relatively rapidly and evaluated on existing text corpora for testing IE systems.
    We present a general text-mining framework called DiscoTEX which employs an IE module for transforming natural-language documents into structured data and a KDD module for discovering prediction rules from the extracted data. When discovering patterns in extracted text, strict matching of strings is inadequate because textual database entries generally exhibit variations due to typographical errors, misspellings, abbreviations, and other sources. We introduce the notion of discovering ``soft-matching'' rules from text and present two new learning algorithms. TextRISE is an inductive method for learning soft-matching prediction rules that integrates rule-based and instance-based learning methods. Simple, interpretable rules are discovered using rule induction, while a nearest-neighbor algorithm provides soft matching. SoftApriori is a text-mining algorithm for discovering association rules from texts that uses a similarity measure to allow flexible matching to variable database items. We present experimental results on inducing prediction and association rules from natural-language texts demonstrating that TextRISE and SoftApriori learn more accurate rules than previous methods for these tasks. We also present an approach to using rules mined from extracted data to improve the accuracy of information extraction. Experimental results demonstate that such discovered patterns can be used to effectively improve the underlying IE method.
    ML ID: 153
  146. Collective Information Extraction with Relational Markov Networks
    [Details] [PDF]
    Razvan Bunescu and Raymond J. Mooney
    In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), 439-446, Barcelona, Spain, July 2004.
    Most information extraction (IE) systems treat separate potential extractions as independent. However, in many cases, considering influences between different potential extractions could improve overall accuracy. Statistical methods based on undirected graphical models, such as conditional random fields (CRFs), have been shown to be an effective approach to learning accurate IE systems. We present a new IE method that employs Relational Markov Networks (a generalization of CRFs), which can represent arbitrary dependencies between extractions. This allows for ``collective information extraction'' that exploits the mutual influence between possible extractions. Experiments on learning to extract protein names from biomedical text demonstrate the advantages of this approach.
    ML ID: 152
  147. Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer
    [Details] [PDF]
    Gregory Kuhlmann, Peter Stone, Raymond J. Mooney, and Jude W. Shavlik
    In The AAAI-2004 Workshop on Supervisory Control of Learning and Adaptive Systems, July 2004.
    We describe our current efforts towards creating a reinforcement learner that learns both from reinforcements provided by its environment and from human-generated advice. Our research involves two complementary components: (a) mapping advice expressed in English to a formal advice language and (b) using advice expressed in a formal notation in a reinforcement learner. We use a subtask of the challenging RoboCup simulated soccer task as our testbed.
    ML ID: 151
  148. Using Soft-Matching Mined Rules to Improve Information Extraction
    [Details] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    In Proceedings of the AAAI-2004 Workshop on Adaptive Text Extraction and Mining (ATEM-2004), 27-32, San Jose, CA, July 2004.
    By discovering predictive relationships between different pieces of extracted data, data-mining algorithms can be used to improve the accuracy of information extraction. However, textual variation due to typos, abbreviations, and other sources can prevent the productive discovery and utilization of hard-matching rules. Recent methods for inducing soft-matching rules from extracted data can more effectively find and exploit predictive relationships in textual data. This paper presents techniques for using mined soft-matching association rules to increase the accuracy of information extraction. Experimental results on a corpus of computer-science job postings demonstrate that soft-matching rules improve information extraction more effectively than hard-matching rules.
    ML ID: 150
  149. Semi-supervised Clustering with Limited Background Knowledge
    [Details] [PDF]
    Sugato Basu
    In Proceedings of the Ninth AAAI/SIGART Doctoral Consortium, 979--980, San Jose, CA, July 2004.
    ML ID: 149
  150. Learnable Similarity Functions and Their Applications to Clustering and Record Linkage
    [Details] [PDF]
    Mikhail Bilenko
    In Proceedings of the Ninth AAAI/SIGART Doctoral Consortium, 981--982, San Jose, CA, July 2004.
    ML ID: 148
  151. Integrating Constraints and Metric Learning in Semi-Supervised Clustering
    [Details] [PDF]
    Mikhail Bilenko, Sugato Basu, and Raymond J. Mooney
    In Proceedings of 21st International Conference on Machine Learning (ICML-2004), 81-88, Banff, Canada, July 2004.
    Semi-supervised clustering employs a small amount of labeled data to aid unsupervised learning. Previous work in the area has utilized supervised data in one of two approaches: 1) constraint-based methods that guide the clustering algorithm towards a better grouping of the data, and 2) distance-function learning methods that adapt the underlying similarity metric used by the clustering algorithm. This paper provides new methods for the two approaches as well as presents a new semi-supervised clustering algorithm that integrates both of these techniques in a uniform, principled framework. Experimental results demonstrate that the unified approach produces better clusters than both individual approaches as well as previously proposed semi-supervised clustering algorithms.
    ML ID: 147
  152. Diverse Ensembles for Active Learning
    [Details] [PDF]
    Prem Melville and Raymond J. Mooney
    In Proceedings of 21st International Conference on Machine Learning (ICML-2004), 584-591, Banff, Canada, July 2004.
    Query by Committee is an effective approach to selective sampling in which disagreement amongst an ensemble of hypotheses is used to select data for labeling. Query by Bagging and Query by Boosting are two practical implementations of this approach that use Bagging and Boosting, respectively, to build the committees. For effective active learning, it is critical that the committee be made up of consistent hypotheses that are very different from each other. DECORATE is a recently developed method that directly constructs such diverse committees using artificial training data. This paper introduces Active-Decorate, which uses Decorate committees to select good training examples. Extensive experimental results demonstrate that, in general, Active-DECORATE outperforms both Query by Bagging and Query by Boosting.
    ML ID: 146
  153. Relational Markov Networks for Collective Information Extraction
    [Details] [PDF]
    Razvan Bunescu and Raymond J. Mooney
    In Proceedings of the ICML-04 Workshop on Statistical Relational Learning and its Connections to Other Fields, Banff, Alberta, July 2004.
    Most information extraction (IE) systems treat separate potential extractions as independent. However, in many cases, considering influences between different potential extractions could improve overall accuracy. Statistical methods based on undirected graphical models, such as conditional random fields (CRFs), have been shown to be an effective approach to learning accurate IE systems. We present a new IE method that employs Relational Markov Networks, which can represent arbitrary dependencies between extractions. This allows for ``collective information extraction'' that exploits the mutual influence between possible extractions. Experiments on learning to extract protein names from biomedical text demonstrate the advantages of this approach.
    ML ID: 145
  154. A Comparison of Inference Techniques for Semi-supervised Clustering with Hidden Markov Random Fields
    [Details] [PDF]
    Mikhail Bilenko and Sugato Basu
    In Proceedings of the ICML-2004 Workshop on Statistical Relational Learning and its Connections to Other Fields (SRL-2004), Banff, Canada, July 2004.
    Recently, a number of methods have been proposed for semi-supervised clustering that employ supervision in the form of pairwise constraints. We describe a probabilistic model for semi-supervised clustering based on Hidden Markov Random Fields (HMRFs) that incorporates relational supervision. The model leads to an EM-style clustering algorithm, the E-step of which requires collective assignment of instances to cluster centroids under the constraints. We evaluate three known techniques for such collective assignment: belief propagation, linear programming relaxation, and iterated conditional modes (ICM). The first two methods attempt to globally approximate the optimal assignment, while ICM is a greedy method. Experimental results indicate that global methods outperform the greedy approach when relational supervision is limited, while their benefits diminish as more pairwise constraints are provided.
    ML ID: 144
  155. Experiments on Ensembles with Missing and Noisy Data
    [Details] [PDF]
    Prem Melville, Nishit Shah, Lilyana Mihalkova, and Raymond J. Mooney
    In F. Roli, J. Kittler, and T. Windeatt, editors, {Lecture Notes in Computer Science:} Proceedings of the Fifth International Workshop on Multi Classifier Systems (MCS-2004), 293-302, Cagliari, Italy, June 2004. Springer Verlag.
    One of the potential advantages of multiple classifier systems is an increased robustness to noise and other imperfections in data. Previous experiments on classification noise have shown that bagging is fairly robust but that boosting is quite sensitive. DECORATE is a recently introduced ensemble method that constructs diverse committees using artificial data. It has been shown to generally outperform both boosting and bagging when training data is limited. This paper compares the sensitivity of bagging, boosting, and DECORATE to three types of imperfect data: missing features, classification noise, and feature noise. For missing data, DECORATE is the most robust. For classification noise, bagging and DECORATE are both robust, with bagging being slightly better than DECORATE, while boosting is quite sensitive. For feature noise, all of the ensemble methods increase the resilience of the base classifier.
    ML ID: 143
  156. Explanation for Recommender Systems: Satisfaction vs. Promotion
    [Details] [PDF]
    Mustafa Bilgic
    Austin, TX, May 2004. Undergraduate Honor Thesis, Department of Computer Sciences, University of Texas at Austin.
    There is much work done on Recommender Systems, systems that automate the recommendation process; however there is little work done on explaining recommendations. The only study we know did an experiment measuring which explanation system increased user's acceptance of the item how much (promotion). We took a different approach and measured which explanation system estimated the true quality of the item the best so that the user can be satisfied with the selection in the end (satisfaction).
    ML ID: 142
  157. Active Semi-Supervision for Pairwise Constrained Clustering
    [Details] [PDF]
    Sugato Basu, Arindam Banerjee, and Raymond J. Mooney
    In Proceedings of the 2004 SIAM International Conference on Data Mining (SDM-04), April 2004.
    Semi-supervised clustering uses a small amount of supervised data to aid unsupervised learning. One typical approach specifies a limited number of must-link and cannot-link constraints between pairs of examples. This paper presents a pairwise constrained clustering framework and a new method for actively selecting informative pairwise constraints to get improved clustering performance. The clustering and active learning methods are both easily scalable to large datasets, and can handle very high dimensional data. Experimental and theoretical results confirm that this active querying of pairwise constraints significantly improves the accuracy of clustering when given a relatively small amount of supervision.
    ML ID: 141
  158. Learning Transformation Rules for Semantic Parsing
    [Details] [PDF]
    Rohit J. Kate, Yuk Wah Wong, Ruifang Ge, and Raymond J. Mooney
    April 2004. Unpublished Technical Report.
    This paper presents an approach for inducing transformation rules that map natural-language sentences into a formal semantic representation language. The approach assumes a formal grammar for the target representation language and learns transformation rules that exploit the non-terminal symbols in this grammar. Patterns for the transformation rules are learned using an induction algorithm based on longest-common-subsequences previously developed for an information extraction system. Experimental results are presented on learning to map English coaching instructions for Robocup soccer into an existing formal language for coaching simulated robotic agents.
    ML ID: 140
  159. Creating Diversity in Ensembles Using Artificial Data
    [Details] [PDF]
    Prem Melville and Raymond J. Mooney
    Journal of Information Fusion: Special Issue on Diversity in Multi Classifier Systems, 6(1):99-111, 2004.
    The diversity of an ensemble of classifiers is known to be an important factor in determining its generalization error. We present a new method for generating ensembles, DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples), that directly constructs diverse hypotheses using additional artificially-constructed training examples. The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. Experimental results using decision-tree induction as a base learner demonstrate that this approach consistently achieves higher predictive accuracy than the base classifier, Bagging and Random Forests. DECORATE also obtains higher accuracy than Boosting on small training sets, and achieves comparable performance on larger training sets.
    ML ID: 139
  160. Learning Semantic Parsers: An Important But Under-Studied Problem
    [Details] [PDF]
    Raymond J. Mooney
    In Papers from the AAAI 2004 Spring Symposium on Language Learning: An Interdisciplinary Perspective, 39--44, Stanford, CA, March 2004.
    Computational systems that learn to transform natural-language sentences into semantic representations have important practical applications in building natural-language interfaces. They can also provide insight into important issues in human language acquisition. However, within AI, computational linguistics, and machine learning, there has been relatively little research on developing systems that learn such semantic parsers. This paper briefly reviews our own work in this area and presents semantic-parser acquistion as an important challenge problem for AI.
    ML ID: 138
  161. Relational Data Mining with Inductive Logic Programming for Link Discovery
    [Details] [PDF]
    Raymond J. Mooney, P. Melville, L. R. Tang, J. Shavlik, I. Dutra and D. Page
    Kargupta, H., Joshi, A., Sivakumar K., and Yesha, Y., editors, Data Mining: Next Generation Challenges and Future Directions:239--254, Menlo Park, CA, 2004. AAAI Press.
    Link discovery (LD) is an important task in data mining for counter-terrorism and is the focus of DARPA's Evidence Extraction and Link Discovery (EELD) research program. Link discovery concerns the identification of complex relational patterns that indicate potentially threatening activities in large amounts of relational data. Most data-mining methods assume data is in the form of a feature-vector (a single relational table) and cannot handle multi-relational data. Inductive logic programming is a form of relational data mining that discovers rules in first-order logic from multi-relational data. This paper discusses the application of ILP to learning patterns for link discovery.
    ML ID: 136
  162. Semisupervised Clustering for Intelligent User Management
    [Details] [PDF]
    Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney
    In Proceedings of the IBM Austin Center for Advanced Studies 5th Annual Austin CAS Conference, Austin, TX, February 2004.
    Grouping users automatically based on their system usage can be beneficial in an autonomic computing environment. Clustering algorithms can generate meaningful user groups that provide important insights to system administrators about user profiles and group policies. In particular, if a small amount of supervision is provided by the administrator to the clustering process, semi-supervised clustering algorithms can use this supervision to generate clusters which are more useful for user management. In this work, we demonstrate the utility of semi-supervised clustering in intelligent user management. We collect publicly available system usage data of users in a university computing environment, and cluster the users using semi-supervised hierarchical agglomerative clustering based on the profile of the processes they run. Initial supervision is provided in the form of a few users running a specific process. Semi-supervised clustering gives us more meaningful clusters than unsupervised clustering in this domain, demonstrating that our technique can find interesting and useful groups in data with minimal user intervention.
    ML ID: 135
  163. Semi-supervised Clustering: Learning with Limited User Feedback
    [Details] [PDF]
    Sugato Basu
    Technical Report, Cornell University, 2004.
    In many machine learning domains (e.g. text processing, bioinformatics), there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate. Consequently, semi-supervised learning, learning from a combination of both labeled and unlabeled data, has become a topic of significant recent interest. In the proposed thesis, our research focus is on semi-supervised clustering, which uses a small amount of supervised data in the form of class labels or pairwise constraints on some examples to aid unsupervised clustering. Semi-supervised clustering can be either search-based, i.e., changes are made to the clustering objective to satisfy user-specified labels/constraints, or similarity-based, i.e., the clustering similarity metric is trained to satisfy the given labels/constraints. Our main goal in the proposed thesis is to study search-based semi-supervised clustering algorithms and apply them to different domains.
    In our initial work, we have shown how supervision can be provided to clustering in the form of labeled data points or pairwise constraints. We have also developed an active learning framework for selecting informative constraints in the pairwise constrained semi-supervised clustering model, and proposed a method for unifying search-based and similarity-based techniques in semi-supervised clustering.
    In this thesis, we want to study other aspects of semi-supervised clustering. Some of the issues we want to investigate include: (1) effect of noisy, probabilistic or incomplete supervision in clustering; (2) model selection techniques for automatic selection of number of clusters in semi-supervised clustering; (3) ensemble semi-supervised clustering. In our work so far, we have mainly focussed on generative clustering models, e.g. KMeans and EM, and ran experiments on clustering low-dimensional UCI datasets or high-dimensional text datasets. In future, we want to study the effect of semi-supervision on other clustering algorithms, especially in the discriminative clustering and online clustering framework. We also want to study the effectiveness of our semi-supervised clustering algorithms on other domains, e.g., web search engines (clustering of search results), astronomy (clustering of Mars spectral images) and bioinformatics (clustering of gene microarray data).
    ML ID: 134
  164. Text Mining with Information Extraction
    [Details] [PDF]
    Raymond J. Mooney and Un Yong Nahm
    In W. Daelemans and T. du Plessis and C. Snyman and L. Teck, editors, Multilingualism and Electronic Language Management: Proceedings of the 4th International MIDP Colloquium, 141-160, Bloemfontein, South Africa, September 2003. Van Schaik: South Africa.
    Text mining concerns looking for patterns in unstructured text. The related task of Information Extraction (IE) is about locating specific items in natural-language documents. This paper presents a framework for text mining, called DiscoTEX (Discovery from Text EXtraction), using a learned information extraction system to transform text into more structured data which is then mined for interesting relationships. The initial version of DiscoTEX integrates an IE module acquired by an IE learning system, and a standard rule induction module. In addition, rules mined from a database extracted from a corpus of texts are used to predict additional information to extract from future documents, thereby improving the recall of the underlying extraction system. Encouraging results are presented on applying these techniques to a corpus of computer job announcement postings from an Internet newsgroup.
    ML ID: 159
  165. Learnable Similarity Functions and Their Applications to Record Linkage and Clustering
    [Details] [PDF]
    Mikhail Bilenko
    2003. Doctoral Dissertation Proposal, University of Texas at Austin.
    Many machine learning tasks require similarity functions that estimate likeness between observations. Similarity computations are particularly important for clustering and record linkage algorithms that depend on accurate estimates of the distance between datapoints. However, standard measures such as string edit distance and Euclidean distance often fail to capture an appropriate notion of similarity for a particular domain or dataset. This problem can be alleviated by employing learnable similarity functions that adapt using training data. In this proposal, we introduce two adaptive string similarity measures: (1) Learnable Edit Distance with Affine Gaps, and (2) Learnable Vector-Space Similarity Based on Pairwise Classification. These similarity functions can be trained using a corpus of labeled pairs of equivalent and non-equivalent strings. We illustrate the accuracy improvements obtained with these measures using MARLIN, our system for record linkage in databases that learns to combine adaptive and static string similarity functions in a two-level learning framework.
    Obtaining useful training examples for learnable similarity functions can be problematic due to scarcity of informative similar and dissimilar object pairs. We propose two strategies, Static-Active Selection and Weakly-Labeled Negatives, that facilitate efficient training data collection for record linkage. These strategies significantly outperform random selection on real datasets without the computational cost of traditional active learning methods. Additionally, we describe a method for combining seeding with Euclidean distance learning for semi-supervised k-means clustering. Experimental evaluation demonstrates that our method outperforms unsupervised clustering and semi-supervised clustering that employs seeding or metric learning separately.
    In future research, we intend to pursue several directions in developing accurate learnable similarity functions and applying them to record linkage and clustering problems. This work will involve improving the proposed string similarity functions as well as introducing several novel approaches to adaptive string distance computation. We also plan to extend our initial work on learnable similarity functions for clustering, particularly for high-dimensional data. Finally, we will investigate the utility of various active learning strategies for learning similarity functions, as well as extend the preliminary work on static-active selection of training pairs.
    ML ID: 133
  166. Creating Diverse Ensemble Classifiers
    [Details] [PDF]
    Prem Melville
    Technical Report UT-AI-TR-03-306, Department of Computer Sciences, University of Texas at Austin, December 2003. Ph.D. proposal.
    Ensemble methods like Bagging and Boosting which combine the decisions of multiple hypotheses are some of the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its generalization error. We present a new method for generating ensembles, DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples), that directly constructs diverse hypotheses using additional artificially-constructed training examples. The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. Experimental results using decision-tree induction as a base learner demonstrate that this approach consistently achieves higher predictive accuracy than both the base classifier and Bagging. DECORATE also obtains higher accuracy than Boosting early in the learning curve when training data is limited.
    We propose to show that DECORATE can also be effectively used for (1) active learning, to reduce the number of training examples required to achieve high accuracy; (2) exploiting unlabeled data to improve accuracy in a semi-supervised learning setting; (3) combining active learning with semi-supervision for improved results; (4) obtaining better class membership probability estimates; (5) reducing the error of regressors; and (6) improving the accuracy of relational learners.
    ML ID: 132
  167. Adaptive Name-Matching in Information Integration
    [Details] [PDF]
    Mikhail Bilenko, William W. Cohen, Stephen Fienberg, Raymond J. Mooney, and Pradeep Ravikumar
    IEEE Intelligent Systems, 18(5):16-23, 2003.
    Identifying approximately duplicate database records that refer to the same entity is essential for information integration. The authors compare and describe methods for combining and learning textual similarity measures for name matching.
    ML ID: 131
  168. Integrating Top-down and Bottom-up Approaches in Inductive Logic Programming: Applications in Natural Language Processing and Relational Data Mining
    [Details] [PDF]
    Lappoon R. Tang
    PhD Thesis, Department of Computer Sciences, University of Texas, Austin, TX, August 2003.
    Inductive Logic Programming (ILP) is the intersection of Machine Learning and Logic Programming in which the learner's hypothesis space is the set of logic programs. There are two major ILP approaches: top-down and bottom-up. The former searches the hypothesis space from general to specific while the latter the other way round. Integrating both approaches has been demonstrated to be more effective. Integrated ILP systems were previously developed for two tasks: learning semantic parsers (Chillin), and mining relational data (Progol). Two new integrated ILP systems for these tasks that overcome limitations of existing methods will be presented.
    Cocktail is a new ILP algorithm for inducing semantic parsers. For this task, two features of a parse state, functional structure and context, provide important information for disambiguation. A bottom-up approach is more suitable for learning the former, while top-down is better for the latter. By allowing both approaches to induce program clauses and choosing the best combination of their results, Cocktail learns more effective parsers. Experimental results on learning natural-language interfaces for two databases demonstrate that it learns more accurate parsers than Chillin, the previous best method for this task.
    Beth is a new integrated ILP algorithm for relational data mining. The Inverse Entailment approach to ILP, implemented in the Progol and Aleph systems, starts with the construction of a bottom clause, the most specific hypothesis covering a seed example. When mining relational data with a large number of background facts, the bottom clause becomes intractably large, making learning very inefficient. A top-down approach heuristically guides the construction of clauses without building a bottom clause; however, it wastes time exploring clauses that cover no positive examples. By using a top-down approach to heuristically guide the construction of generalizations of a bottom clause, Beth combines the strength of both approaches. Learning patterns for detecting potential terrorist activity is a current challenge problem for relational data mining. Experimental results on artificial data for this task with over half a million facts show that Beth is significantly more efficient at discovering such patterns than Aleph and m-Foil, two leading ILP systems.
    ML ID: 130
  169. On Evaluation and Training-Set Construction for Duplicate Detection
    [Details] [PDF]
    Mikhail Bilenko and Raymond J. Mooney
    In Proceedings of the KDD-03 Workshop on Data Cleaning, Record Linkage, and Object Consolidation, 7-12, Washington, DC, August 2003.
    A variety of experimental methodologies have been used to evaluate the accuracy of duplicate-detection systems. We advocate presenting precision-recall curves as the most informative evaluation methodology. We also discuss a number of issues that arise when evaluating and assembling training data for adaptive systems that use machine learning to tune themselves to specific applications. We consider several different application scenarios and experimentally examine the effectiveness of alternative methods of collecting training data under each scenario. We propose two new approaches to collecting training data called static-active learning and weakly-labeled non-duplicates, and present experimental results on their effectiveness.
    ML ID: 129
  170. Scaling Up ILP to Large Examples: Results on Link Discovery for Counter-Terrorism
    [Details] [PDF]
    Lappoon R. Tang, Raymond J. Mooney, and Prem Melville
    In Proceedings of the KDD-2003 Workshop on Multi-Relational Data Mining (MRDM-2003), 107--121, Washington DC, August 2003.
    Inductive Logic Programming (ILP) has been shown to be a viable approach to many problems in multi-relational data mining (e.g. bioinformatics). Link discovery (LD) is an important task in data mining for counter-terrorism and is the focus of DARPA's program on Evidence Extraction and Link Discovery (EELD). Learning patterns for LD is a novel problem in relational data mining that is characterized by having an unprecedented number of background facts. As a result of the explosion in background facts, the efficiency of existing ILP algorithms becomes a serious limitation. This paper presents a new ILP algorithm that integrates top-down and bottom-up search in order to reduce search when processing large examples. Experimental results on EELD data confirm that it significantly improves efficiency over existing ILP methods.
    ML ID: 128
  171. Adaptive Duplicate Detection Using Learnable String Similarity Measures
    [Details] [PDF]
    Mikhail Bilenko and Raymond J. Mooney
    In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), 39-48, Washington, DC, August 2003.
    The problem of identifying approximately duplicate records in databases is an essential step for data cleaning and data integration processes. Most existing approaches have relied on generic or manually tuned distance metrics for estimating the similarity of potential duplicates. In this paper, we present a framework for improving duplicate detection using trainable measures of textual similarity. We propose to employ learnable text distance functions for each database field, and show that such measures are capable of adapting to the specific notion of similarity that is appropriate for the field's domain. We present two learnable text similarity measures suitable for this task: an extended variant of learnable string edit distance, and a novel vector-space based measure that employs a Support Vector Machine (SVM) for training. Experimental results on a range of datasets show that our framework can improve duplicate detection accuracy over traditional techniques.
    ML ID: 127
  172. Learning to Extract Proteins and their Interactions from Medline Abstracts
    [Details] [PDF]
    Razvan Bunescu, Ruifang Ge, Rohit J. Kate, Raymond J. Mooney, Yuk Wah Wong, Edward M. Marcotte, and Arun Kumar Ramani
    In Proceedings of the ICML-03 Workshop on Machine Learning in Bioinformatics, 46-53, Washington, DC, August 2003.
    We present results from a variety of learned information extraction systems for identifying human protein names in Medline abstracts and subsequently extracting interactions between the proteins. We demonstrate that machine learning approaches using support vector machines and hidden Markov models are able to identify human proteins with higher accuracy than several previous approaches. We also demonstrate that various rule induction methods are able to identify protein interactions with higher precision than manually-developed rules.
    ML ID: 126
  173. Comparing and Unifying Search-Based and Similarity-Based Approaches to Semi-Supervised Clustering
    [Details] [PDF]
    Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney
    In Proceedings of the ICML-2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, 42-49, Washington, DC, 2003.
    Semi-supervised clustering employs a small amount of labeled data to aid unsupervised learning. Previous work in the area has employed one of two approaches: 1) Search-based methods that utilize supervised data to guide the search for the best clustering, and 2) Similarity-based methods that use supervised data to adapt the underlying similarity metric used by the clustering algorithm. This paper presents a unified approach based on the K-Means clustering algorithm that incorporates both of these techniques. Experimental results demonstrate that the combined approach generally produces better clusters than either of the individual approaches.
    ML ID: 125
  174. Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction
    [Details] [PDF]
    Mary Elaine Califf and Raymond J. Mooney
    Journal of Machine Learning Research:177-210, 2003.
    Information Extraction is a form of shallow text processing that locates a specified set of relevant items in a natural-language document. Systems for this task require significant domain-specific knowledge and are time-consuming and difficult to build by hand, making them a good application for machine learning. We present a aystem, RAPIER, that uses pairs of sample documents and filled templates to induce pattern-match rules that directly extract fillers for the slots in the template. RAPIER employs a bottom-up learning algorithm which incorporates techniques from several inductive logic programming systems and acquires unbounded patterns that include constraints on the words, part-of-speech tags, and semantic classes present in the filler and the surrounding text. We present encouraging experimental results on two domains.
    ML ID: 124
  175. Employing Trainable String Similarity Metrics for Information Integration
    [Details] [PDF]
    Mikhail Bilenko and Raymond J. Mooney
    In Proceedings of the IJCAI-03 Workshop on Information Integration on the Web, 67-72, Acapulco, Mexico, August 2003.
    The problem of identifying approximately duplicate records in databases is an essential step for the information integration processes. Most existing approaches have relied on generic or manually tuned distance metrics for estimating the similarity of potential duplicates. In this paper, we present a framework for improving duplicate detection using trainable measures of textual similarity. We propose to employ learnable text distance functions for each data field, and introduce an extended variant of learnable string edit distance based on an Expectation-Maximization(EM) training algorithm. Experimental results on a range of datasets show that this similarity metric is capable of adapting to the specific notions of similarity that are appropriate for different domains. Our overall system, MARLIN utilizes support vector machines to combine multiple similarity metrics, which are shown to perform better than ensembles of decision trees, which were employed for this task previously.
    ML ID: 123
  176. Constructing Diverse Classifier Ensembles Using Artificial Training Examples
    [Details] [PDF]
    Prem Melville and Raymond J. Mooney
    In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-2003), 505-510, Acapulco, Mexico, August 2003.
    Ensemble methods like bagging and boosting that combine the decisions of multiple hypotheses are some of the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its generalization error. This paper presents a new method for generating ensembles that directly constructs diverse hypotheses using additional artificially-constructed training examples. The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. Experimental results using decision-tree induction as a base learner demonstrate that this approach consistently achieves higher predictive accuracy than both the base classifier and bagging (whereas boosting can occasionally decrease accuracy), and also obtains higher accuracy than boosting early in the learning curve when training data is limited.
    ML ID: 122
  177. Acquiring Word-Meaning Mappings for Natural Language Interfaces
    [Details] [PDF]
    Cynthia A. Thompson and Raymond J. Mooney
    Journal of Artificial Intelligence Research, 18:1-44, 2003.
    This paper focuses on a system, Wolfie (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with semantic representations. The lexicon learned consists of phrases paired with meaning representations. Wolfie is part of an integrated system that learns to parse representations such as logical database queries.
    Experimental results are presented demonstrating Wolfie's ability to learn useful lexicons for a database interface in four different natural languages. The usefulness of the lexicons learned by Wolfie are compared to those acquired by a similar system developed by Siskind (1996), with results favorable to Wolfie. A second set of experiments demonstrates Wolfie's ability to scale to larger and more difficult, albeit artificially generated, corpora.
    In natural language acquisition, it is difficult to gather the annotated data needed for supervised learning; however, unannotated data is fairly plentiful. Active learning methods (Cohn, Atlas, & Ladner, 1994) attempt to select for annotation and training only the most informative examples, and therefore are potentially very useful in natural language applications. However, most results to date for active learning have only considered standard classification tasks. To reduce annotation effort while maintaining accuracy, we apply active learning to semantic lexicons. We show that active learning can significantly reduce the number of annotated examples required to achieve a given level of performance.
    ML ID: 121
  178. Associative Anaphora Resolution: A Web-Based Approach
    [Details] [PDF]
    Razvan Bunescu
    In Proceedings of the EACL-2003 Workshop on the Computational Treatment of Anaphora, 47-52, Budapest, Hungary, 2003.
    We present a novel approach to solving definite descriptions in unrestricted text based on searching the web for a particular type of lexicosyntactic patterns. Using statistics on these patterns, we intend to recover the antecedents for a predefined subset of definite descriptions occurring in two types of anaphoric relations: identity anaphora and associative anaphora. Preliminary results obtained with this method are promising and compare well with other methods.
    ML ID: 120
  179. Machine Learning
    [Details] [PDF]
    Raymond J. Mooney
    New York, NY, 2003. McGraw-Hill.
    This chapter introduces symbolic machine learning in which decision trees, rules, or case-based classifiers are induced from supervised training examples. It describes the representation of knowledge assumed by each of these approaches and reviews basic algorithms for inducing such representations from annotated training examples and using the acquired knowledge to classify future instances. These techniques can be applied to learn knowledge required for a variety of problems in computational linguistics ranging from part-of-speech tagging and syntactic parsing to word-sense disambiguation and anaphora resolution. Applications to a variety of these problems are reviewed.
    ML ID: 119
  180. Property-Based Feature Engineering and Selection
    [Details] [PDF]
    Noppadon Kamolvilassatian
    Masters Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, December 2002. 85 pages.
    Example representation is a fundamental problem in machine learning. In particular, the decision on what features are extracted and selected to be included in the learning process significantly affects learning performance.
    This work proposes a novel framework for feature representation based on feature properties and applies it to the domain of textual information extraction. Our framework enables knowledge on feature engineering and selection to be explicitly learned and applied. The application of this knowledge can improve learning performance within the domain from which it is learned and in other domains with similar representational bias.
    We conducted several experiments comparing the performance of feature engineering and selection methods based on our framework with other approaches in the Information Extraction task. Results suggested that our approach performs either competitively or better than the best heuristic-based feature selection approach used. Moreover, our general framework can potentially be combined with other feature selection approaches to yield even better results.
    ML ID: 118
  181. Mining Soft-Matching Association Rules
    [Details] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    In Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM-2002), 681-683, McLean, VA, November 2002.
    Variation and noise in database entries can prevent data mining algorithms, such as association rule mining, from discovering important regularities. In particular, textual fields can exhibit variation due to typographical errors, mispellings, abbreviations, etc.. By allowing partial or "soft matching" of items based on a similarity metric such as edit-distance or cosine similarity, additional important patterns can be detected. This paper introduces an algorithm, SoftApriori that discovers soft-matching association rules given a user-supplied similarity metric for each field. Experimental results on several "noisy" datasets extracted from text demonstrate that SoftApriori discovers additional relationships that more accurately reflect regularities in the data.
    ML ID: 117
  182. Relational Data Mining with Inductive Logic Programming for Link Discovery
    [Details] [PDF]
    Raymond J. Mooney, Prem Melville, Lappoon R. Tang, Jude Shavlik, Inês de Castro Dutra, David Page, and Vítor Santos Costa
    In Proceedings of the National Science Foundation Workshop on Next Generation Data Mining, Baltimore, MD, November 2002.
    Link discovery (LD) is an important task in data mining for counter-terrorism and is the focus of DARPA's Evidence Extraction and Link Discovery (EELD) research program. Link discovery concerns the identification of complex relational patterns that indicate potentially threatening activities in large amounts of relational data. Most data-mining methods assume data is in the form of a feature-vector (a single relational table) and cannot handle multi-relational data. Inductive logic programming is a form of relational data mining that discovers rules in first-order logic from multi-relational data. This paper discusses the application of ILP to learning patterns for link discovery.
    ML ID: 116
  183. Two Approaches to Handling Noisy Variation in Text Mining
    [Details] [PDF]
    Un Yong Nahm, Mikhail Bilenko, and Raymond J. Mooney
    In Papers from the Nineteenth International Conference on Machine Learning (ICML-2002) Workshop on Text Learning, 18-27, Sydney, Australia, July 2002.
    Variation and noise in textual database entries can prevent text mining algorithms from discovering important regularities. We present two novel methods to cope with this problem: (1) an adaptive approach to ``hardening'' noisy databases by identifying duplicate records, and (2) mining ``soft'' association rules. For identifying approximately duplicate records, we present a domain-independent two-level method for improving duplicate detection accuracy based on machine learning. For mining soft matching rules, we introduce an algorithm that discovers association rules by allowing partial matching of items based on a textual similarity metric such as edit distance or cosine similarity. Experimental results on real and synthetic datasets show that our methods outperform traditional techniques for noisy textual databases.
    ML ID: 115
  184. Content-Boosted Collaborative Filtering for Improved Recommendations
    [Details] [PDF]
    Prem Melville, Raymond J. Mooney, and Ramadass Nagarajan
    In Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI-02), 187-192, Edmonton, Alberta, 2002.
    Most recommender systems use Collaborative Filtering or Content-based methods to predict new items of interest for a user. While both methods have their own advantages, individually they fail to provide good recommendations in many situations. Incorporating components from both methods, a hybrid recommender system can overcome these shortcomings. In this paper, we present an elegant and effective framework for combining content and collaboration. Our approach uses a content-based predictor to enhance existing user data, and then provides personalized suggestions through collaborative filtering. We present experimental results that show how this approach, Content-Boosted Collaborative Filtering, performs better than a pure content-based predictor, pure collaborative filter, and a naive hybrid approach.
    ML ID: 114
  185. Semi-supervised Clustering by Seeding
    [Details] [PDF]
    Sugato Basu, Arindam Banerjee, and Raymond J. Mooney
    In Proceedings of 19th International Conference on Machine Learning (ICML-2002), 19-26, 2002.
    Semi-supervised clustering uses a small amount of labeled data to aid and bias the clustering of unlabeled data. This paper explores the use of labeled data to generate initial seed clusters, as well as the use of constraints generated from labeled data to guide the clustering process. It introduces two semi-supervised variants of KMeans clustering that can be viewed as instances of the EM algorithm, where labeled data provides prior information about the conditional distributions of hidden category labels. Experimental results demonstrate the advantages of these methods over standard random seeding and COP-KMeans, a previously developed semi-supervised clustering algorithm.
    ML ID: 113
  186. Text Mining with Information Extraction
    [Details] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    In Proceedings of the AAAI 2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases, 60-67, Stanford, CA, March 2002.
    Text mining concerns looking for patterns in unstructured text. The related task of Information Extraction (IE) is about locating specific items in natural-language documents. This paper presents a framework for text mining, called DiscoTEX (Discovery from Text EXtraction), using a learned information extraction system to transform text into more structured data which is then mined for interesting relationships. The initial version of DiscoTEX integrates an IE module acquired by an IE learning system, and a standard rule induction module. However, this approach has problems when the same extracted entity or feature is represented by similar but not identical strings in different documents. Consequently, we also develop an alternate rule induction system called TextRISE, that allows for partial matching of textual items. Encouraging preliminary results are presented on applying these techniques to a corpus of Internet documents.
    ML ID: 112
  187. Extracting Gene and Protein Names from Biomedical Abstracts
    [Details] [PDF]
    Razvan Bunescu, Ruifang Ge, Raymond J. Mooney, Edward Marcotte, and Arun Kumar Ramani
    March 2002. Unpublished Technical Note.
    Automatically extracting information from biomedical text holds the promise of easily consolidating large amounts of biological knowledge in computer accessible form. We are investigating the use of information extraction techniques for processing biomedical text. Currently, we have focused on the initial stage of identifying information on interacting proteins, specifically the problem of recognizin protein and gene names with high precision. We present preliminary results on extracting protein names from Medline abstracts.
    ML ID: 111
  188. Learning to Combine Trained Distance Metrics for Duplicate Detection in Databases
    [Details] [PDF]
    Mikhail Bilenko and Raymond J. Mooney
    Technical Report AI 02-296, Artificial Intelligence Laboratory, University of Texas at Austin, Austin, TX, February 2002.
    The problem of identifying approximately duplicate records in databases has previously been studied as record linkage, the merge/purge problem, hardening soft databases, and field matching. Most existing approaches have focused on efficient algorithms for locating potential duplicates rather than precise similarity metrics for comparing records. In this paper, we present a domain-independent method for improving duplicate detection accuracy using machine learning. First, trainable distance metrics are learned for each field, adapting to the specific notion of similarity that is appropriate for the field's domain. Second, a classifier is employed that uses several diverse metrics for each field as distance features and classifies pairs of records as duplicates or non-duplicates. We also propose an extended model of learnable string distance which improves over an existing approach. Experimental results on real and synthetic datasets show that our method outperforms traditional techniques.
    ML ID: 110
  189. ELIXIR: A Library for Writing Wrappers in Java
    [Details] [PDF]
    Edward Wild
    December 2001. Undergraduate Honor Thesis, Department of Computer Sciences, University of Texas at Austin.
    ELIXIR is a library for writing wrappers in Java. ELIXIR provides a way to combine text extraction and spidering in wrappers. Since wrappers using ELIXIR are Java programs, they are eays to integrate with other Java program. The user can also extend the functionality of ELIXIR by implement new ItemExtractors. In an experiment, a wrapper written using ELIXIR showed an 89% reduction in non-comment source statements from a wrapper written using a prototype of ELIXIR. In another experiemnt, a wrapper written using ELIXIR showed a 90% reduction in non-comment source statements from a wrapper written using SPHINX, a Java toolkit for writing spiders.
    ML ID: 109
  190. Content-Boosted Collaborative Filtering
    [Details] [PDF]
    Prem Melville, Raymond J. Mooney, and Ramadass Nagarajan
    In Proceedings of the SIGIR-2001 Workshop on Recommender Systems, New Orleans, LA, September 2001.
    Most recommender systems use Collaborative Filtering or Content-based methods to predict new items of interest for a user. While both methods have their own advantages, individually they fail to provide good recommendattions in many situations. Incorporating components from both methods, a hybrid recommender system can overcome these shortcomings. In this paper, we present an elegant and effective framework for combining content and collaboration. Our approach uses a content-based predictor to enhance existing user data, and then provides personalized suggestions through collaborative filtering. We present experimental results that show how this approach, Content-Boosted Collaborative Filtering, performs better than a pure content-based predictor, pure collaborative filter, and a naive hybrid approach. We also discuss methods to improve the performance of our hybrid system.
    ML ID: 108
  191. Using Multiple Clause Constructors in Inductive Logic Programming for Semantic Parsing
    [Details] [PDF]
    Lappoon R. Tang and Raymond J. Mooney
    In Proceedings of the 12th European Conference on Machine Learning, 466-477, Freiburg, Germany, 2001.
    In this paper, we explored a learning approach which combines different learning methods in inductive logic programming (ILP) to allow a learner to produce more expressive hypothese than that of each individual learner. Such a learning approach may be useful when the performance of the task depends on solving a large amount of classification problems and each has its own characteristics which may or may not fit a particular learning method. The task of sematnic parser acquisition in two different domains was attempted and preliminary results demonstrated that such an approach is promising.
    ML ID: 107
  192. Evaluating the Novelty of Text-Mined Rules using Lexical Knowledge
    [Details] [PDF]
    Sugato Basu, Raymond J. Mooney, Krupakar V. Pasupuleti, and Joydeep Ghosh
    In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001), 233-239, San Francisco, CA, 2001.
    In this paper, we present a new method of estimating the novelty of rules discovered by data-mining methods using WordNet, a lexical knowledge-base of English words. We assess the novelty of a rule by the average semantic distance in a knowledge hierarchy between the words in the antecedent and the consequent of the rule -- the more the average distance, more is the novelty of the rule. The novelty of rules extracted by the DiscoTEX text-mining system on Amazon.com book descriptions were evaluated by both human subjects and by our algorithm. By computing correlation coefficients between pairs of human ratings and between human and automatic ratings, we found that the automatic scoring of rules based on our novelty measure correlates with human judgments about as well as human judgments correlate with one another.
    ML ID: 106
  193. Mining Soft-Matching Rules from Textual Data
    [Details] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    In Proceedings of the 18th International Joint Conference on Artificial Intelligence, 2001.
    Text mining concerns the discovery of knowledge from unstructured textual data. One important task is the discovery of rules that relate specific words and phrases. Although existing methods for this task learn traditional logical rules, soft-matching methods that utilize word-frequency information generally work better for textual data. This paper presents a rule induction system, TextRISE, that allows for partial matching of text-valued features by combining rule-based and instance-based learning. We present initial experiments applying TextRISE to corpora of book descriptions and patent documents retrieved from the web and compare its results to those of traditional rule and instance based methods.
    ML ID: 105
  194. Using Lexical Knowlege to Evaluate the Novelty of Rules Mined from Text
    [Details] [PDF]
    Sugato Basu, Raymond J. Mooney, Krupakar V. Pasupuleti, and Joydeep Ghosh
    In Proceedings of NAACL 2001 Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations, 144--149, Pittsburg, PA, June 2001.
    We present a novel application of WordNet to estimating the interestingness of rules discovered by data-mining methods. We estimate the novelty of text-mined rules using semantic distance measures based on WordNet. In our experiments, we found that the automatic scoring of rules based on our novelty measure correlates with human judgments about as well as human judgments correlate with each other.
    ML ID: 104
  195. Text Mining with Information Extraction
    [Details] [PDF]
    Un Yong Nahm
    February 2001. Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin.
    Text mining is a relatively new research area at the intersection of data mining, natural-language processing, machine learning, and information retrieval. The goal of text mining is to discover knowledge in unstructured text. The related task of Information Extraction (IE) concerns locating specific items of data in natural-language documents, thereby transforming unstructured text into a structured database. Although handmade IE systems have existed for a while, automatic construction of information extraction systems using machine learning is more recent. This proposal presents a new framework for text mining, called DiscoTEX (Discovery from Text EXtraction), which uses a learned information extraction system to transform text into more structured data which is then mined for interesting relationships.
    DiscoTEX combines IE and standard data mining methods to perform text mining as well as improve the performance of the underlying IE system. It discovers prediction rules from natural-language corpora, and these rules are used to predict additional information to extract from future documents, thereby improving the recall of IE. The initial version of DiscoTex integrates an IE module acquired by the Rapier learning system, and a standard rule induction module such as C4.5rules or Ripper. Encouraging initial results are presented on applying these techniques to a corpus of computer job announcements posted on an Internet newsgroup. However, this approach has problems when the same extracted entity or feature is represented by similar but not identical strings in different documents. Consequently, we are also developing an alternate rule induction system for DiscoTex called, TextRISE, that allows for partial matching of string-valued features. We also present initial results applying the TextRISE rule learner to corpora of book descriptions and patent documents retrieved from the World Wide Web (WWW). Future research will involve thorough testing on several domains, further development of this approach, and extensions of the proposed framework (currently limited to prediction rule discovery) to additional text mining tasks.
    ML ID: 103
  196. Automated Construction of Database Interfaces: Integrating Statistical and Relational Learning for Semantic Parsing
    [Details] [PDF]
    Lappoon R. Tang and Raymond J. Mooney
    In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora(EMNLP/VLC-2000), 133-141, Hong Kong, October 2000.
    The development of natural language interfaces (NLI's) for databases has been a challenging problem in natural language processing (NLP) since the 1970's. The need for NLI's has become more pronounced due to the widespread access to complex databases now available through the Internet. A challenging problem for empirical NLP is the automated acquisition of NLI's from training examples. We present a method for integrating statistical and relational learning techniques for this task which exploits the strength of both approaches. Experimental results from three different domains suggest that such an approach is more robust than a previous purely logic-based approach.
    ML ID: 102
  197. Using Information Extraction to Aid the Discovery of Prediction Rules from Text
    [Details] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    In Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining (KDD-2000) Workshop on Text Mining, 51--58, Boston, MA, August 2000.
    Text mining and Information Extraction(IE) are both topics of significant recent interest. Text mining concerns applying data mining, a.k.a. knowledge discovery from databases (KDD) techniques to unstructured text. Information extraction (IE) is a form of shallow text understanding that locates specific pieces of data in natural language documents, transforming unstructured text into a structured database. This paper describes a system called DiscoTEX, that combines IE and KDD methods to perform a text mining task, discovering prediction rules from natural-language corpora. An initial version of DiscoTEX is constructed by integrating an IE module based on Rapier and a rule-learning module, Ripper. We present encouraging results on applying these techniques to a corpus of computer job postings from an Internet newsgroup.
    ML ID: 101
  198. A Mutually Beneficial Integration of Data Mining and Information Extraction
    [Details] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    In Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-00), 627-632, Austin, TX, July 2000.
    Text mining concerns applying data mining techniques to unstructured text. Information extraction (IE) is a form of shallow text understanding that locates specific pieces of data in natural language documents, transforming unstructured text into a structured database. This paper describes a system called DiscoTEX, that combines IE and data mining methodologies to perform text mining as well as improve the performance of the underlying extraction system. Rules mined from a database extracted from a corpus of texts are used to predict additional information to extract from future documents, thereby improving the recall of IE. Encouraging results are presented on applying these techniques to a corpus of computer job postings from an Internet newsgroup.
    ML ID: 100
  199. Integrating Statistical and Relational Learning for Semantic Parsing: Applications to Learning Natural Language Interfaces for Databases
    [Details] [PDF]
    Lappoon R. Tang
    May 2000. Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin.
    The development of natural language interfaces (NLIs) for databases has been an interesting problem in natural language processing since the 70's. The need for NLIs has become more pronounced given the widespread access to complex databases now available through the Internet. However, such systems are difficult to build and must be tailored to each application. A current research topic involves using machine learning methods to automate the development of NLI's. This proposal presents a method for learning semantic parsers (systems for mapping natural language to logical form) that integrates logic-based and probabilistic methods in order to exploit the complementary strengths of these competing approaches. More precisely, an inductive logic programming (ILP) method, TABULATE, is developed for learning multiple models that are integrated via linear weighted combination to produce probabilistic models for statistical semantic parsing. Initial experimental results from three different domains suggest that an integration of statistical and logical approaches to semantic parsing can outperform a purely logical approach. Future research will further develop this integrated approach and demonstrate its ability to improve the automated development of NLI's.
    ML ID: 99
  200. Content-Based Book Recommending Using Learning for Text Categorization
    [Details] [PDF]
    Raymond J. Mooney and Loriene Roy
    In Proceedings of the Fifth ACM Conference on Digital Libraries, 195-204, San Antonio, TX, June 2000.
    Recommender systems improve access to relevant products and information by making personalized suggestions based on previous examples of a user's likes and dislikes. Most existing recommender systems use collaborative filtering methods that base recommendations on other users' preferences. By contrast, content-based methods use information about an item itself to make suggestions. This approach has the advantage of being able to recommend previously unrated items to users with unique interests and to provide explanations for its recommendations. We describe a content-based book recommending system that utilizes information extraction and a machine-learning algorithm for text categorization. Initial experimental results demonstrate that this approach can produce accurate recommendations.
    ML ID: 98
  201. Integrating Abduction and Induction in Machine Learning
    [Details] [PDF]
    Raymond J. Mooney
    In P. A. Flach and A. C. Kakas, editors, Abduction and Induction, 181-191, 2000. Kluwer Academic Publishers.
    This article discusses the integration of traditional abductive and inductive reasoning methods in the development of machine learning systems. In particular, it reviews our recent work in two areas: 1) The use of traditional abductive methods to propose revisions during theory refinement, where an existing knowledge base is modified to make it consistent with a set of empirical data; and 2) The use of inductive learning methods to automatically acquire from examples a diagnostic knowledge base used for abductive reasoning. Experimental results on real-world problems are presented to illustrate the capabilities of both of these approaches to integrating the two forms of reasoning.
    ML ID: 97
  202. Learning for Semantic Interpretation: Scaling Up Without Dumbing Down
    [Details] [PDF]
    Raymond J. Mooney
    In Workshop Notes for the Workshop on Learning Language in Logic, 7-15, Bled, Slovenia, 2000.
    Most recent research in learning approaches to natural language have studied fairly ``low-level'' tasks such as morphology, part-of-speech tagging, and syntactic parsing. However, I believe that logical approaches may have the most relevance and impact at the level of semantic interpretation, where a logical representation of sentence meaning is important and useful. We have explored the use of inductive logic programming for learning parsers that map natural-language database queries into executable logical form. This work goes against the growing trend in computational linguistics of focusing on shallow but broad-coverage natural language tasks (``scaling up by dumbing down'') and instead concerns using logic-based learning to develop narrower, domain-specific systems that perform relatively deep processing. I first present a historical view of the shifting emphasis of research on various tasks in natural language processing and then briefly review our own work on learning for semantic interpretation. I will then attempt to encourage others to study such problems and explain why I believe logical approaches have the most to offer at the level of producing semantic interpretations of complete sentences.
    ML ID: 93
  203. Content-Based Book Recommending Using Learning for Text Categorization
    [Details] [PDF]
    Raymond J. Mooney and Loriene Roy
    In Proceedings of the SIGIR-99 Workshop on Recommender Systems: Algorithms and Evaluation, Berkeley, CA, August 1999.
    Recommender systems improve access to relevant products and information by making personalized suggestions based on previous examples of a user's likes and dislikes. Most existing recommender systems use social filtering methods that base recommendations on other users' preferences. By contrast, content-based methods use information about an item itself to make suggestions. This approach has the advantage of being able to recommended previously unrated items to users with unique interests and to provide explanations for its recommendations. We describe a content-based book recommending system that utilizes information extraction and a machine-learning algorithm for text categorization. Initial experimental results demonstrate that this approach can produce accurate recommendations. These experiments are based on ratings from random samplings of items and we discuss problems with previous experiments that employ skewed samples of user-selected examples to evaluate performance.
    ML ID: 96
  204. Automatic Construction of Semantic Lexicons for Learning Natural Language Interfaces
    [Details] [PDF]
    Cynthia A. Thompson and Raymond J. Mooney
    In Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), 487-493, Orlando, FL, July 1999.
    This paper describes a system, Wolfie (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with semantic representations. The lexicon learned consists of words paired with meaning representations. Wolfie is part of an integrated system that learns to parse novel sentences into semantic representations, such as logical database queries. Experimental results are presented demonstrating Wolfie's ability to learn useful lexicons for a database interface in four different natural languages. The lexicons learned by Wolfie are compared to those acquired by a competing system developed by Siskind.
    ML ID: 95
  205. Relational Learning of Pattern-Match Rules for Information Extraction
    [Details] [PDF]
    Mary Elaine Califf and Raymond J. Mooney
    In Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), 328-334, Orlando, FL, July 1999.
    Information extraction is a form of shallow text processing that locates a specified set of relevant items in a natural-language document. Systems for this task require significant domain-specific knowledge and are time-consuming and difficult to build by hand, making them a good application for machine learning. This paper presents a system, Rapier, that takes pairs of sample documents and filled templates and induces pattern-match rules that directly extract fillers for the slots in the template. Rapier employs a bottom-up learning algorithm which incorporates techniques from several inductive logic programming systems and acquires unbounded patterns that include constraints on the words, part-of-speech tags, and semantic classes present in the filler and the surrounding text. We present encouraging experimental results on two domains.
    ML ID: 94
  206. Active Learning for Natural Language Parsing and Information Extraction
    [Details] [PDF]
    Cynthia A. Thompson, Mary Elaine Califf and Raymond J. Mooney
    In Proceedings of the Sixteenth International Conference on Machine Learning (ICML-99), 406-414, Bled, Slovenia, June 1999.
    In natural language acquisition, it is difficult to gather the annotated data needed for supervised learning; however, unannotated data is fairly plentiful. Active learning methods attempt to select for annotation and training only the most informative examples, and therefore are potentially very useful in natural language applications. However, existing results for active learning have only considered standard classification tasks. To reduce annotation effort while maintaining accuracy, we apply active learning to two non-classification tasks in natural language processing: semantic parsing and information extraction. We show that active learning can significantly reduce the number of annotated examples required to achieve a given level of performance for these complex tasks.
    ML ID: 92
  207. Using HTML Structure and Linked Pages to Improve Learning for Text Categorization
    [Details] [PDF]
    Michael B. Cline
    Technical Report AI 98-270, Department of Computer Sciences, University of Texas at Austin, Austin, TX, May 1999. Undergraduate Honors Thesis.
    Classifying web pages is an important task in automating the organization of information on the WWW, and learning for text categorization can help automate the development of such systems. This project explores using two aspects of HTML to improve learning for text categorization: 1) Using HTML tags such as titles, links, and headings to partition the text on a page and 2) Using the pages linked from a given page to augment its description. Initial experimental results on 26 categories from the Yahoo hierarchy demonstrate the promise of these two methods for improving the accuracy of a bag-of-words text classifier using a simple Bayesian learning algorithm.
    ML ID: 91
  208. Semantic Lexicon Acquisition for Learning Natural Language Interfaces
    [Details] [PDF]
    Cynthia Ann Thompson
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, December 1998. 101 pages. Also appears as Technical Report AI 99-278, Artificial Intelligence Lab, University of Texas at Austin.
    A long-standing goal for the field of artificial intelligence is to enable computer understanding of human languages. A core requirement in reaching this goal is the ability to transform individual sentences into a form better suited for computer manipulation. This ability, called semantic parsing, requires several knowledge sources, such as a grammar, lexicon, and parsing mechanism.
    Building natural language parsing systems by hand is a tedious, error-prone undertaking. We build on previous research in automating the construction of such systems using machine learning techniques. The result is a combined system that learns semantic lexicons and semantic parsers from one common set of training examples. The input required is a corpus of sentence/representation pairs, where the representations are in the output format desired. A new system, Wolfie, learns semantic lexicons to be used as background knowledge by a previously developed parser acquisition system, Chill. The combined system is tested on a real world domain of answering database queries. We also compare this combination to a combination of Chill with a previously developed lexicon learner, demonstrating superior performance with our system. In addition, we show the ability of the system to learn to process natural languages other than English. Finally, we test the system on an alternate sentence representation, and on a set of large, artificial corpora with varying levels of ambiguity and synonymy.
    One difficulty in using machine learning methods for building natural language interfaces is building the required annotated corpus. Therefore, we also address this issue by using active learning to reduce the number of training examples required by both Wolfie and Chill. Experimental results show that the number of examples needed to reach a given level of performance can be significantly reduced with this method.
    ML ID: 90
  209. Semantic Lexicon Acquisition for Learning Natural Language Interfaces
    [Details] [PDF]
    Cynthia A. Thompson and Raymond J. Mooney
    In Proceedings of the Sixth Workshop on Very Large Corpora, Montreal, Quebec, Canada, August 1998. Also available as TR AI 98-273, Artificial Intelligence Lab, University of Texas at Austin, May 1998.
    This paper describes a system, WOLFIE (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with representations of their meaning. The lexicon learned consists of words paired with meaning representations. WOLFIE is part of an integrated system that learns to parse novel sentences into semantic representations, such as logical database queries. Experimental results are presented demonstrating WOLFIE's ability to learn useful lexicons for a database interface in four different natural languages. The lexicons learned by WOLFIE are compared to those acquired by a competing system developed by Siskind (1996).
    ML ID: 89
  210. Relational Learning Techniques for Natural Language Information Extraction
    [Details] [PDF]
    Mary Elaine Califf
    PhD Thesis, Department of Computer Sciences, University of Texas, Austin, TX, August 1998. 142 pages. Also appears as Artificial Intelligence Laboratory Technical Report AI 98-276.
    The recent growth of online information available in the form of natural language documents creates a greater need for computing systems with the ability to process those documents to simplify access to the information. One type of processing appropriate for many tasks is information extraction, a type of text skimming that retrieves specific types of information from text. Although information extraction systems have existed for two decades, these systems have generally been built by hand and contain domain specific information, making them difficult to port to other domains. A few researchers have begun to apply machine learning to information extraction tasks, but most of this work has involved applying learning to pieces of a much larger system. This dissertation presents a novel rule representation specific to natural language and a relational learning system, Rapier, which learns information extraction rules. Rapier takes pairs of documents and filled templates indicating the information to be extracted and learns pattern-matching rules to extract fillers for the slots in the template. The system is tested on several domains, showing its ability to learn rules for different tasks. Rapier's performance is compared to a propositional learning system for information extraction, demonstrating the superiority of relational learning for some information extraction tasks. Because one difficulty in using machine learning to develop natural language processing systems is the necessity of providing annotated examples to supervised learning systems, this dissertation also describes an attempt to reduce the number of examples Rapier requires by employing a form of active learning. Experimental results show that the number of examples required to achieve a given level of performance can be significantly reduced by this method.
    ML ID: 88
  211. Theory Refinement for Bayesian Networks with Hidden Variables
    [Details] [PDF]
    Sowmya Ramachandran and Raymond J. Mooney
    In Proceedings of the Fifteenth International Conference on Machine Learning (ICML-98), 454--462, Madison, WI, July 1998.
    While there has been a growing interest in the problem of learning Bayesian networks from data, no technique exists for learning or revising Bayesian networks with Hidden variables (i.e. variables not represented in the data), that has been shown to be efficient, effective, and scalable through evaluation on real data. The few techniques that exist for revising such networks perform a blind search through a large spaces of revisons, and are therefore computationally expensive. This paper presents BANNER, a technique for using data to revise a given bayesian network with noisy-or and noisy-and nodes, to improve its classification accuracy. The initial network can be derived directly from a logical theory expresssed as propositional rules. BANNER can revise networks with hidden variables, and add hidden variables when necessary. Unlike previous approaches, BANNER employs mechanisms similar to logical theory refinement techniques for using the data to focus the search for effective modifications. Experiments on real-world problems in the domain of molecular biology demonstrate that BANNER can effectively revise fairly large networks to significantly improve their accuracies.
    ML ID: 87
  212. Book Recommending Using Text Categorization with Extracted Information
    [Details] [PDF]
    Raymond J. Mooney, Paul N. Bennett, and Loriene Roy
    In Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98)"-REC-WKSHP98, year="1998, 70-74, Madison, WI, 1998.
    Content-based recommender systems suggest documents, items, and services to users based on learning a profile of the user from rated examples containing information about the given items. Text categorization methods are very useful for this task but generally rely on unstructured text. We have developed a book-recommending system that utilizes semi-structured information about items gathered from the web using simple information extraction techniques. Initial experimental results demonstrate that this approach can produce fairly accurate recommendations.
    ML ID: 86
  213. Text Categorization Through Probabilistic Learning: Applications to Recommender Systems
    [Details] [PDF]
    Paul N. Bennett
    1998. Honors thesis, Department of Computer Sciences, The University of Texas at Austin.
    With the growth of the World Wide Web, recommender systems have received an increasing amount of attention. Many recommender systems in use today are based on collaborative filtering. This project has focused on LIBRA, a content-based book recommending system. By utilizing text categorization methods and the information available for each book, the system determines a user profile which is used as the basis of recommendations made to the user. Instead of the bag-of-words approach used in many other statistical text categorization approaches, LIBRA parses each text sample into a semi-structured representation. We have used standard Machine Learning techniques to analyze the performance of several algorithms on this learning task. In addition, we analyze the utility of several methods of feature construction and selection (i.e. methods of choosing the representation of an item that the learning algorithm actually uses). After analyzing the system we conclude that good recommendations are produced after a relatively small number of training examples. We also conclude that the feature selection method tested does not improve the performance of these algorithms in any systematic way, though the results indicate other feature selection methods may prove useful. Feature construction, however, while not providing a large increase in performance with the particular construction methods used here, holds promise of providing performance improvements for the algorithms investigated. This text assumes only minor familiarity with concepts of artificial intelligence and should be readable by the upper division computer science undergraduate familiar with basic concepts of probability theory and set theory.
    ML ID: 85
  214. Theory Refinement of Bayesian Networks with Hidden Variables
    [Details] [PDF]
    Sowmya Ramachandran and Raymond J. Mooney
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, May 1998. 139 pages. Also appears as Technical Report AI 98-265, Artificial Intelligence Lab, University of Texas at Austin.
    Research in theory refinement has shown that biasing a learner with initial, approximately correct knowledge produces more accurate results than learning from data alone. While techniques have been developed to revise logical and connectionist representations, little has been done to revise probabilistic representations. Bayesian networks are well-established as a sound formalism for representing and reasoning with probabilistic knowledge, and are widely used. There has been a growing interest in the problem of learning Bayesian networks from data. However, there is no existing technique for learning or revising Bayesian networks with hidden variables (i.e. variables not represented in the data) that has been shown to be efficient, effective, and scalable through evaluation on real data. The few techniques that exist for revising such networks perform a blind search through a large space of revisions, and are therefore computationally expensive. This dissertation presents Banner, a technique for using data to revise a giv en Bayesian network with Noisy-Or and Noisy-And nodes, to improve its classification accuracy. Additionally, the initial netwrk can be derived directly from a logical theory expressed as propositional Horn-clause rules. Banner can revise networks with hidden variables, and add hidden variables when necessary. Unlike previous approaches to this problem, Banner employs mechanisms similar to those used in logical theory refinement techniques for using the data to focus the search for effective modifications to the network. It can also be used to learn networks with hidden variables from data alone. We also introduce Banner-Pr, a technique for revising the parameters of a Bayesian network with Noisy-Or/And nodes, that directly exploits the computational efficiency afforded by these models. Experiments on several real-world learning problems in domains such as molecular biology and intelligent tutoring systems demonstrate that Banner can effectively and efficiently revise networks to significantly improve their accuracies, and thus learn highly accurate classifiers. Comparisons with the Naive Bayes algorithm show that using the theory refinement approach gives Banner a substantial edge over learning from data alone. We also show that Banner-Pr converges faster and produces more accurate classifiers than an existing algorithm for learning the parameters of a network.
    ML ID: 84
  215. An Experimental Comparison of Genetic Programming and Inductive Logic Programming on Learning Recursive List Functions
    [Details] [PDF]
    Lappoon R. Tang, Mary Elaine Califf, and Raymond J. Mooney
    Technical Report AI 98-271, Artificial Intelligence Lab, University of Texas at Austin, May 1998.
    This paper experimentally compares three approaches to program induction: inductive logic programming (ILP), genetic programming (GP), and genetic logic programming (GLP) (a variant of GP for inducing Prolog programs). Each of these methods was used to induce four simple, recursive, list-manipulation functions. The results indicate that ILP is the most likely to induce a correct program from small sets of random examples, while GP is generally less accurate. GLP performs the worst, and is rarely able to induce a correct program. Interpretations of these results in terms of differences in search methods and inductive biases are presented.
    ML ID: 83
  216. Advantages of Decision Lists and Implicit Negatives in Inductive Logic Programming
    [Details] [PDF]
    Mary Elaine Califf and Raymond J. Mooney
    New Generation Computing, 16(3):263-281, 1998.
    This paper demonstrates the capabilities of FOIDL, an inductive logic programming (ILP) system whose distinguishing characteristics are the ability to produce first-order decision lists, the use of an output completeness assumption as a substitute for negative examples, and the use of intensional background knowledge. The development of FOIDL was originally motivated by the problem of learning to generate the past tense of English verbs; however, this paper demonstrates its superior performance on two different sets of benchmark ILP problems. Tests on the finite element mesh design problem show that FOIDL's decision lists enable it to produce generally more accurate results than a range of methods previously applied to this problem. Tests with a selection of list-processing problems from Bratko's introductory Prolog text demonstrate that the combination of implicit negatives and intensionality allow FOIDL to learn correct programs from far fewer examples than FOIL.
    ML ID: 82
  217. Using Multi-Strategy Learning to Improve Planning Efficiency and Quality
    [Details] [PDF]
    Tara A. Estlin
    PhD Thesis, Department of Computer Sciences, The University of Texas at Austin, Austin, TX, 1998.
    Artificial intelligence planning systems have become an important tool for automating a wide variety of tasks. However, even the most current planning algorithms suffer from two major problems. First, they often require infeasible amounts of computation time to solve problems in most domains. And second, they are not guaranteed to return the best solution to a planning problem, and in fact can sometimes return very low-quality solutions. One way to address these problems is to provide a planning system with domain-specific control knowledge, which helps guide the planner towards more promising search paths. Machine learning techniques enable a planning system to automatically acquire search-control knowledge for different applications. A considerable amount of planning and learning research has been devoted to acquiring rules that improve planning efficiency, also known as speedup learning. Much less work has been down in learning knowledge to improve the quality of plans, even though this is an essential feature for many real-world planning systems. Furthermore, even less research has been done in acquiring control knowledge to improve both these metrics.

    The learning system presented in this dissertation, called SCOPE, is a unique approach to learning control knowledge for planning. SCOPE learns domain-specific control rules for a planner that improve both planning efficiency and plan quality, and it is one of the few systems that can learn control knowledge for partial-order planning. SCOPE's architecture integrates explanation-based learning (EBL) with techniques from inductive logic programming. Specifically, EBL is used to constrain an inductive search for control heuristics that help a planner choose between competing plan refinements. Since SCOPE uses a very flexible training approach, its learning algorithm can be easily focused to prefer search paths that are better for particular evaluation metrics. SCOPE is extensively tested on several planning domains, including a logistics transportation domain and a production manufacturing domain. In these tests, it is shown to significantly improve both planning efficiency and quality and is shown to be more robust than a competing approach.

    ML ID: 81
  218. Relational Learning of Pattern-Match Rules for Information Extraction
    [Details] [PDF]
    Mary Elaine Califf and Raymond J. Mooney
    In Proceedings of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, 6-11, Standford, CA, March 1998.
    Information extraction is a form of shallow text processing which locates a specified set of relevant items in natural language documents. Such systems can be useful, but require domain-specific knowledge and rules, and are time-consuming and difficult to build by hand, making infomation extraction a good testbed for the application of machine learning techniques to natural language processing. This paper presents a system, RAPIER, that takes pairs of documents and filled templates and induces pattern-match rules that directly extract fillers for the slots in the template. The learning algorithm incorporates techniques from several inductive logic programming systems and learns unbounded patterns that include constraints on the words and part-of-speech tags surrounding the filler. Encouraging results are presented on learning to extract information from computer job postings from the newsgroup misc.jobs.offered.
    ML ID: 80
  219. Integrating Abduction and Induction in Machine Learning
    [Details] [PDF]
    Raymond J. Mooney
    In Working Notes of the IJCAI-97 Workshop on Abduction and Induction in AI, 37--42, Nagoya, Japan, August 1997.
    This paper discusses the integration of traditional abductive and inductive reasoning methods in the development of machine learning systems. In particular, the paper discusses our recent work in two areas: 1) The use of traditional abductive methods to propose revisions during theory refinement, where an existing knowledge base is modified to make it consistent with a set of empirical data; and 2) The use of inductive learning methods to automatically acquire from examples a diagnostic knowledge base used for abductive reasoning.
    ML ID: 79
  220. Relational Learning Techniques for Natural Language Information Extraction
    [Details] [PDF]
    Mary Elaine Califf
    1997. Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin.
    The recent growth of online information available in the form of natural language documents creates a greater need for computing systems with the ability to process those documents to simplify access to the information. One type of processing appropriate for many tasks is information extraction, a type of text skimming that retrieves specific types of information from text. Although information extraction systems have existed for two decades, these systems have generally been built by hand and contain domain specific information, making them difficult to port to other domains. A few researchers have begun to apply machine learning to information extraction tasks, but most of this work has involved applying learning to pieces of a much larger system. This paper presents a novel rule representation specific to natural language and a learning system, RAPIER, which learns information extraction rules. RAPIER takes pairs of documents and filled templates indicating the information to be extracted and learns patterns to extract fillers for the slots in the template. This proposal presents initial results on a small corpus of computer-related job postings with a preliminary version of RAPIER. Future research will involve several enhancements to RAPIER as well as more thorough testing on several domains and extension to additional natural language processing tasks. We intend to extend the rule representation and algorithm to allow for more types of constraints than are currently supported. We also plan to incorporate active learning, or sample selection, methods, specifically query by committee, into RAPIER. These methods have the potential to substantially reduce the amount of annotation required. We will explore the issue of distinguishing relevant and irrelevant messages, since currently RAPIER only extracts from the any messages given to it, assuming that all are relevant. We also intend to run much larger tests with RAPIER on multiple domains including the terrorism domain from the third and fourth Message Uncderstanding Conferences, which will allow comparison against other systems. Finally, we plan to demonstrate the generality of RAPIER`s representation and algorithm by applying it to other natural language processing tasks such as word sense disambiguation.
    ML ID: 78
  221. Learning to Improve both Efficiency and Quality of Planning
    [Details] [PDF]
    Tara A. Estlin and Raymond J. Mooney
    In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97), 1227-1232, Nagoya, Japan, 1997.
    Most research in learning for planning has concentrated on efficiency gains. Another important goal is improving the quality of final plans. Learning to improve plan quality has been examined by a few researchers, however, little research has been done learning to improve both efficiency and quality. This paper explores this problem by using the SCOPE learning system to acquire control knowledge that improves on both of these metrics. Since SCOPE uses a very flexible training approach, we can easily focus it's learning algorithm to prefer search paths that are better for particular evaluation metrics. Experimental results show that SCOPE can significantly improve both the quality of final plans and overall planning efficiency.
    ML ID: 77
  222. Applying ILP-based Techniques to Natural Language Information Extraction: An Experiment in Relational Learning
    [Details] [PDF]
    Mary Elaine Califf and Raymond J. Mooney
    In Workshop Notes of the IJCAI-97 Workshop on Frontiers of Inductive Logic Programming, 7--11, Nagoya, Japan, August 1997.
    Information extraction systems process natural language documents and locate a specific set of relevant items. Given the recent success of empirical or corpus-based approaches in other areas of natural language processing, machine learning has the potential to significantly aid the development of these knowledge-intensive systems. This paper presents a system, RAPIER, that takes pairs of documents and filled templates and induces pattern-match rules that directly extract fillers for the slots in the template. The learning algorithm incorporates techniques from several inductive logic programming systems and learns unbounded patterns that include constraints on the words and part-of-speech tags surrounding the filler. Encouraging results are presented on learning to extract information from computer job postings from the newsgroup misc.jobs.offered.
    ML ID: 76
  223. Learning to Parse Natural Language Database Queries into Logical Form
    [Details] [PDF]
    Cynthia A. Thompson, Raymond J. Mooney, and Lappoon R. Tang
    In Proceedings of the ML-97 Workshop on Automata Induction, Grammatical Inference, and Language Acquisition, Nashville, TN, July 1997.
    For most natural language processing tasks, a parser that maps sentences into a semantic representation is significantly more useful than a grammar or automata that simply recognizes syntactically well-formed strings. This paper reviews our work on using inductive logic programming methods to learn deterministic shift-reduce parsers that translate natural language into a semantic representation. We focus on the task of mapping database queries directly into executable logical form. An overview of the system is presented followed by recent experimental results on corpora of Spanish geography queries and English job-search queries.
    ML ID: 75
  224. Relational Learning of Pattern-Match Rules for Information Extraction
    [Details] [PDF]
    Mary Elaine Califf and Raymond J. Mooney
    In Proceedings of the ACL Workshop on Natural Language Learning, 9-15, Madrid, Spain, July 1997.
    Information extraction systems process natural language documents and locate a specific set of relevant items. Given the recent success of empirical or corpus-based approaches in other areas of natural language processing, machine learning has the potential to significantly aid the development of these knowledge-intensive systems. This paper presents a system, RAPIER, that takes pairs of documents and filled templates and induces pattern-match rules that directly extract fillers for the slots in the template. The learning algorithm incorporates techniques from several inductive logic programming systems and learns unbounded patterns that include constraints on the words and part-of-speech tags surrounding the filler. Encouraging results are presented on learning to extract information from computer job postings from the newsgroup misc.jobs.offered.
    ML ID: 74
  225. Learning Parse and Translation Decisions From Examples With Rich Context
    [Details] [PDF]
    Ulf Hermjakob and Raymond J. Mooney
    In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL'97/EACL'97), 482-489, July 1997.
    This paper presents a knowledge and context-based system for parsing and translating natural language and evaluates it on sentences from the Wall Street Journal. Applying machine learning techniques, the system uses parse action examples acquired under supervision to generate a deterministic shift-reduce parser in the form of a decision structure. It relies heavily on context, as encoded in features which describe the morpholgical, syntactical, semantical and other aspects of a given parse state.
    ML ID: 73
  226. Learning Parse and Translation Decisions From Examples With Rich Context
    [Details] [PDF]
    Ulf Hermjakob
    PhD Thesis, Department of Computer Sciences, The University of Texas at Austin, Austin, TX, May 1997. 175 pages. Technical Report UT-AI97-261.
    The parsing of unrestricted text, with its enormous lexical and structural ambiguity, still poses a great challenge in natural language processing. The difficulties with traditional approaches, which try to master the complexity of parse grammars with hand-crafted rules, have led to a trend towards more empirical techniques.

    We therefore propose a system for parsing and translating natural language that learns from examples and uses some background knowledge.
    As our parsing model we choose a deterministic shift-reduce type parser that integrates part-of-speech tagging and syntactic and semantic processing, which not only makes parsing very efficient, but also assures transparency during the supervised example acquisition.
    Applying machine learning techniques, the system uses parse action examples to generate a parser in the form of a decision structure, a generalization of decision trees.
    To learn good parsing and translation decisions, our system relies heavily on context, as encoded in currently 205 features describing the morphological, syntactical and semantical aspects of a given parse state. Compared with recent probabilistic systems that were trained on 40,000 sentences, our system relies on more background knowledge and a deeper analysis, but radically fewer examples, currently 256 sentences.

    We test our parser on lexically limited sentences from the Wall Street Journal and achieve accuracy rates of 89.8% for labeled precision, 98.4% for part of speech tagging and 56.3% of test sentences without any crossing brackets. Machine translations of 32 Wall Street Journal sentences to German have been evaluated by 10 bilingual volunteers and been graded as 2.4 on a 1.0 (best) to 6.0 (worst) scale for both grammatical correctness and meaning preservation. The translation quality was only minimally better (2.2) when starting each translation with the correct parse tree, which indicates that the parser is quite robust and that its errors have only a moderate impact on final trans- lation quality. These parsing and translation results already compare well with other systems and, given the relatively small training set and amount of overall knowledge used so far, the results suggest that our system Contex can break previous accuracy ceilings when scaled up further.

    ML ID: 72
  227. An Inductive Logic Programming Method for Corpus-based Parser Construction
    [Details] [PDF]
    John M. Zelle and Raymond J. Mooney
    January 1997. Unpublished Technical Note.
    Empirical methods for building natural language systems has become an important area of research in recent years. Most current approaches are based on propositional learning algorithms and have been applied to the problem of acquiring broad-coverage parsers for relatively shallow (syntactic) representations. This paper outlines an alternative empirical approach based on techniques from a subfield of machine learning known as Inductive Logic Programming (ILP). ILP algorithms, which learn relational (first-order) rules, are used in a parser acquisition system called CHILL that learns rules to control the behavior of a traditional shift-reduce parser. Using this approach, CHILL is able to learn parsers for a variety of different types of analyses, from traditional syntax trees to more meaning-oriented case-role and database query forms. Experimental evidence shows that CHILL performs comparably to propositional learning systems on similar tasks, and is able to go beyond the broad-but-shallow paradigm and learn mappings directly from sentences into useful semantic representations. In a complete database-query application, parsers learned by CHILL outperform an existing hand-crafted system, demonstrating the promise of empricial techniques for automating the construction certain NLP systems.
    ML ID: 71
  228. Parameter Revision Techniques for Bayesian Networks with Hidden Variables: An Experimental Comparison
    [Details] [PDF]
    Sowmya Ramachandran and Raymond J. Mooney
    January 1997. Unpublished Technical Note.
    Learning Bayesian networks inductively in the presence of hidden variables is still an open problem. Even the simpler task of learning just the conditional probabilities on a Bayesian network with hidden variables is not completely solved. In this paper, we present an approach that learns the parameters of a Bayesian network composed of noisy-or and noisy-and nodes by using a gradient descent back-propagation approach similar to that used to train neural networks. For the task of causal inference, it has the advantage of being able to learn in the presence of hidden variables. We compare the performance of this approach with the adaptive probabilistic networks technique on a real-world classification problem in molecular biology, and show that our approach trains faster and learns networks with higher classification accuracy.
    ML ID: 70
  229. Semantic Lexicon Acquisition for Learning Parsers
    [Details] [PDF]
    Cynthia A. Thompson and Raymond J. Mooney
    1997. Submitted for review.
    This paper describes a system, WOLFIE (WOrd Learning From Interpreted Examples), that learns a semantic lexicon from a corpus of sentences paired with representations of their meaning. The lexicon learned consists of words paired with representations of their meaning, and allows for both synonymy and polysemy. WOLFIE is part of an integrated system that learns to parse novel sentences into their meaning representations. Experimental results are presented that demonstrate WOLFIE's ability to learn useful lexicons for a realistic domain. The lexicons learned by WOLFIE are also compared to those learned by another lexical acquisition system, that of Siskind (1996).
    ML ID: 69
  230. Inductive Logic Programming for Natural Language Processing
    [Details] [PDF]
    Raymond J. Mooney
    In Stephen Muggleton, editors, Inductive Logic Programming: Selected papers from the 6th International Workshop, 3-22, Berlin, 1996. Springer Verlag.
    This paper reviews our recent work on applying inductive logic programming to the construction of natural language processing systems. We have developed a system, CHILL, that learns a parser from a training corpus of parsed sentences by inducing heuristics that control an initial overly-general shift-reduce parser. CHILL learns syntactic parsers as well as ones that translate English database queries directly into executable logical form. The ATIS corpus of airline information queries was used to test the acquisition of syntactic parsers, and CHILL performed competitively with recent statistical methods. English queries to a small database on U.S. geography were used to test the acquisition of a complete natural language interface, and the parser that CHILL acquired was more accurate than an existing hand-coded system. The paper also includes a discussion of several issues this work has raised regarding the capabilities and testing of ILP systems as well as a summary of our current research directions.
    ML ID: 68
  231. Integrating Explanation-Based and Inductive Learning Techniques to Acquire Search-Control for Planning
    [Details] [PDF]
    Tara A. Estlin
    Technical Report AI96-250, Department of Computer Sciences, University of Texas, Austin, TX, 1996.
    Planning systems have become an important tool for automating a wide variety of tasks. Control knowledge guides a planner to find solutions quickly and is crucial for efficient planning in most domains. Machine learning techniques enable a planning system to automatically acquire domain-specific search-control knowledge for different applications. Past approaches to learning control information have usually employed explanation-based learning (EBL) to generate control rules. Unfortunately, EBL alone often produces overly complex rules that actually decrease rather than improve overall planning efficiency. This paper presents a novel learning approach for control knowledge acquisition that integrates explanation-based learning with techniques from inductive logic programming. In our learning system SCOPE, EBL is used to constrain an inductive search for control heuristics that help a planner choose between competing plan refinements. SCOPE is one of the few systems to address learning control information for newer, partial-order planners. Specifically, this proposal describes how SCOPE learns domain-specific control rules for the UCPOP planning algorithm. The resulting system is shown to produce significant speedup in two different planning domains, and to be more effective than a pure EBL approach.

    Future research will be performed in three main areas. First, SCOPE's learning algorithm will be extended to include additional techniques such as constructive induction and rule utility analysis. Second, SCOPE will be more thoroughly tested; several real-world planning domains have been identified as possible testbeds, and more in-depth comparisons will be drawn between SCOPE and other competing approaches. Third, SCOPE will be implemented in a different planning system in order to test its portability to other planning algorithms. This work should demonstrate that machine-learning techniques can be a powerful tool in the quest for tractable real-world planning.

    ML ID: 67
  232. Learning to Parse Database Queries using Inductive Logic Programming
    [Details] [PDF]
    John M. Zelle and Raymond J. Mooney
    In AAAI/IAAI, 1050-1055, Portland, OR, August 1996. AAAI Press/MIT Press.
    This paper presents recent work using the CHILL parser acquisition system to automate the construction of a natural-language interface for database queries. CHILL treats parser acquisition as the learning of search-control rules within a logic program representing a shift-reduce parser and uses techniques from Inductive Logic Programming to learn relational control knowledge. Starting with a general framework for constructing a suitable logical form, CHILL is able to train on a corpus comprising sentences paired with database queries and induce parsers that map subsequent sentences directly into executable queries. Experimental results with a complete database-query application for U.S. geography show that CHILL is able to learn parsers that outperform a pre-existing, hand-crafted counterpart. These results demonstrate the ability of a corpus-based system to produce more than purely syntactic representations. They also provide direct evidence of the utility of an empirical approach at the level of a complete natural language application.
    ML ID: 66
  233. A Novel Application of Theory Refinement to Student Modeling
    [Details] [PDF]
    Paul Baffes and Raymond J. Mooney
    In Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), 403-408, Portland, OR, August 1996.
    Theory refinement systems developed in machine learning automatically modify a knowledge base to render it consistent with a set of classified training examples. We illustrate a novel application of these techniques to the problem of constructing a student model for an intelligent tutoring system (ITS). Our approach is implemented in an ITS authoring system called Assert which uses theory refinement to introduce errors into an initially correct knowledge base so that it models incorrect student behavior. The efficacy of the approach has been demonstrated by evaluating a tutor developed with Assert with 75 students tested on a classification task covering concepts from an introductory course on the C++ programming language. The system produced reasonably accurate models and students who received feedback based on these models performed significantly better on a post test than students who received simple reteaching.
    ML ID: 65
  234. Qualitative Multiple-Fault Diagnosis of Continuous Dynamic Systems Using Behavioral Modes
    [Details] [PDF]
    Siddarth Subramanian and Raymond J. Mooney
    In Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), 965-970, Portland, OR, August 1996.
    Most model-based diagnosis systems, such as GDE and Sherlock, have concerned discrete, static systems such as logic circuits and use simple constraint propagation to detect inconsistencies. However, sophisticated systems such as QSIM and QPE have been developed for qualitative modeling and simulation of continuous dynamic systems. We present an integration of these two lines of research as implemented in a system called QDOCS for multiple-fault diagnosis of continuous dynamic systems using QSIM models. The main contributions of the algorithm include a method for propagating dependencies while solving a general constraint satisfaction problem and a method for verifying the consistency of a behavior with a model across time. Through systematic experiments on two realistic engineering systems, we demonstrate that QDOCS demonstrates the best balance of generality, accuracy, and efficiency among competing methods.
    ML ID: 64
  235. Multi-Strategy Learning of Search Control for Partial-Order Planning
    [Details] [PDF]
    Tara A. Estlin and Raymond J. Mooney
    In Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), 843-848, Portland, OR, August 1996.
    Most research in planning and learning has involved linear, state-based planners. This paper presents SCOPE, a system for learning search-control rules that improve the performance of a partial-order planner. SCOPE integrates explanation-based and inductive learning techniques to acquire control rules for a partial-order planner. Learned rules are in the form of selection heuristics that help the planner choose between competing plan refinements. Specifically, SCOPE learns domain-specific control rules for a version of the UCPOP planning algorithm. The resulting system is shown to produce significant speedup in two different planning domains.
    ML ID: 63
  236. Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning
    [Details] [PDF]
    Raymond J. Mooney
    In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-96), 82-91, Philadelphia, PA, 1996.
    This paper describes an experimental comparison of seven different learning algorithms on the problem of learning to disambiguate the meaning of a word from context. The algorithms tested include statistical, neural-network, decision-tree, rule-based, and case-based classification techniques. The specific problem tested involves disambiguating six senses of the word ``line'' using the words in the current and proceeding sentence as context. The statistical and neural-network methods perform the best on this particular problem and we discuss a potential reason for this observed difference. We also discuss the role of bias in machine learning and its importance in explaining performance differences observed on specific problems.
    ML ID: 62
  237. Combining Symbolic and Connectionist Learning Methods to Refine Certainty-Factor Rule-Bases
    [Details] [PDF]
    J. Jeffrey Mahoney
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, May 1996. 113 pages.
    This research describes the system RAPTURE, which is designed to revise rule bases expressed in certainty-factor format. Recent studies have shown that learning is facilitated when biased with domain-specific expertise, and have also shown that many real-world domains require some form of probabilistic or uncertain reasoning in order to successfully represent target concepts. RAPTURE was designed to take advantage of both of these results.

    Beginning with a set of certainty-factor rules, along with accurately-labelled training examples, RAPTURE makes use of both symbolic and connectionist learning techniques for revising the rules, in order that they correctly classify all of the training examples. A modified version of backpropagation is used to adjust the certainty factors of the rules, ID3's information-gain heuristic is used to add new rules, and the Upstart algorithm is used to create new hidden terms in the rule base.

    Results on refining four real-world rule bases are presented that demonstrate the effectiveness of this combined approach. Two of these rule bases were designed to identify particular areas in strands of DNA, one is for identifying infectious diseases, and the fourth attempts to diagnose soybean diseases. The results of RAPTURE are compared with those of backpropagation, C4.5, KBANN, and other learning systems. RAPTURE generally produces sets of rules that are more accurate that these other systems, often creating smaller sets of rules and using less training time.

    ML ID: 61
  238. Integrating EBL and ILP to Acquire Control Rules for Planning
    [Details] [PDF]
    Tara A. Estlin and Raymond J. Mooney
    In Proceedings of the Third International Workshop on Multi-Strategy Learning (MSL-96), 271--279, Harpers Ferry, WV, May 1996.
    Most approaches to learning control information in planning systems use explanation-based learning to generate control rules. Unfortunately, EBL alone often produces overly complex rules that actually decrease planning efficiency. This paper presents a novel learning approach for control knowledge acquisition that integrates explanation-based learning with techniques from inductive logic programming. EBL is used to constrain an inductive search for selection heuristics that help a planner choose between competing plan refinements. SCOPE is one of the few systems to address learning control information in the newer partial-order planners. Specifically, SCOPE learns domain-specific control rules for a version of the UCPOP planning algorithm. The resulting system is shown to produce significant speedup in two different planning domains.
    ML ID: 60
  239. Refinement-Based Student Modeling and Automated Bug Library Construction
    [Details] [PDF]
    Paul Baffes and Raymond Mooney
    Journal of Artificial Intelligence in Education, 7(1):75-116, 1996.
    A critical component of model-based intelligent tutoring sytems is a mechanism for capturing the conceptual state of the student, which enables the system to tailor its feedback to suit individual strengths and weaknesses. To be useful such a modeling technique must be practical, in the sense that models are easy to construct, and effective, in the sense that using the model actually impacts student learning. This research presents a new student modeling technique which can automatically capture novel student errors using only correct domain knowledge, and can automatically compile trends across multiple student models. This approach has been implemented as a computer program, ASSERT, using a machine learning technique called theory refinement, which is a method for automatically revising a knowledge base to be consistent with a set of examples. Using a knowledge base that correctly defines a domain and examples of a student's behavior in that domain, ASSERT models student errors by collecting any refinements to the correct knowledege base which are necessary to account for the student's behavior. The efficacy of the approach has been demonstrated by evaluating ASSERT using 100 students tested on a classification task covering concepts from an introductory course on the C++ programming language. Students who received feedback based on the models automatically generated by ASSERT performed significantly better on a post test than students who received simple teaching.
    ML ID: 59
  240. Revising Bayesian Network Parameters Using Backpropagation
    [Details] [PDF]
    Sowmya Ramachandran and Raymond J. Mooney
    In Proceedings of the International Conference on Neural Networks (ICNN-96), Special Session on Knowledge-Based Artificial Neural Networks, 82--87, Washington DC, June 1996.
    The problem of learning Bayesian networks with hidden variables is known to be a hard problem. Even the simpler task of learning just the conditional probabilities on a Bayesian network with hidden variables is hard. In this paper, we present an approach that learns the conditional probabilities on a Bayesian network with hidden variables by transforming it into a multi-layer feedforward neural network (ANN). The conditional probabilities are mapped onto weights in the ANN, which are then learned using standard backpropagation techniques. To avoid the problem of exponentially large ANNs, we focus on Bayesian networks with noisy-or and noisy-and nodes. Experiments on real world classification problems demonstrate the effectiveness of our technique.
    ML ID: 58
  241. Corpus-Based Lexical Acquisition For Semantic Parsing
    [Details] [PDF]
    Cynthia Thompson
    February 1996. Ph.D. proposal.
    Building accurate and efficient natural language processing (NLP) systems is an important and difficult problem. There has been increasing interest in automating this process. The lexicon, or the mapping from words to meanings, is one component that is typically difficult to update and that changes from one domain to the next. Therefore, automating the acquisition of the lexicon is an important task in automating the acquisition of NLP systems. This proposal describes a system, WOLFIE (WOrd Learning From Interpreted Examples), that learns a lexicon from input consisting of sentences paired with representations of their meanings. Preliminary experimental results show that this system can learn correct and useful mappings. The correctness is evaluated by comparing a known lexicon to one learned from the training input. The usefulness is evaluated by examining the effect of using the lexicon learned by WOLFIE to assist a parser acquisition system, where previously this lexicon had to be hand-built. Future work in the form of extensions to the algorithm, further evaluation, and possible applications is discussed.
    ML ID: 57
  242. Lexical Acquisition: A Novel Machine Learning Problem
    [Details] [PDF]
    Cynthia A. Thompson and Raymond J. Mooney
    Technical Report, Artificial Intelligence Lab, University of Texas at Austin, January 1996.
    This paper defines a new machine learning problem to which standard machine learning algorithms cannot easily be applied. The problem occurs in the domain of lexical acquisition. The ambiguous and synonymous nature of words causes the difficulty of using standard induction techniques to learn a lexicon. Additionally, negative examples are typically unavailable or difficult to construct in this domain. One approach to solve the lexical acquisition problem is presented, along with preliminary experimental results on an artificial corpus. Future work includes extending the algorithm and performing tests on a more realistic corpus.
    ML ID: 56
  243. Advantages of Decision Lists and Implicit Negative in Inductive Logic Programming
    [Details] [PDF]
    Mary Elaine Califf and Raymond J. Mooney
    Technical Report, Artificial Intelligence Lab, University of Texas at Austin, January 1996.
    This paper demonstrates the capabilities of FOIDL, an inductive logic programming (ILP) system whose distinguishing characteristics are the ability to produce first-order decision lists, the use of an output completeness assumption to provide implicit negative examples, and the use of intensional background knowledge. The development of FOIDL was originally motivated by the problem of learning to generate the past tense of English verbs; however, this paper demonstrates its superior performance on two different sets of benchmark ILP problems. Tests on the finite element mesh design problem show that FOIDL's decision lists enable it to produce better results than all other ILP systems whose results on this problem have been reported. Tests with a selection of list-processing problems from Bratko's introductory Prolog text demonstrate t hat the combination of implicit negatives and intensionality allow FOIDL to learn correct programs from far fewer examples than FOIL.
    ML ID: 55
  244. Comparative Results on Using Inductive Logic Programming for Corpus-based Parser Construction
    [Details] [PDF]
    John M. Zelle and Raymond J. Mooney
    In Stefan Wermter and Ellen Riloff and Gabriela Scheler, editors, Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, 355-369, Berlin, 1996. Springer.
    This paper presents results from recent experiments with CHILL, a corpus-based parser acquisition system. CHILL treats language acquisition as the learning of search-control rules within a logic program. Unlike many current corpus-based approaches that use statistical learning algorithms, CHILL uses techniques from inductive logic programming (ILP) to learn relational representations. CHILL is a very flexible system and has been used to learn parsers that produce syntactic parse trees, case-role analyses, and executable database queries. The reported experiments compare CHILL's performance to that of a more naive application of ILP to parser acquisition. The results show that ILP techniques, as employed in CHILL, are a viable alternative to statistical methods and that the control-rule framework is fundamental to CHILL's success.
    ML ID: 54
  245. Learning the Past Tense of English Verbs Using Inductive Logic Programming
    [Details] [PDF]
    Raymond J. Mooney and Mary Elaine Califf
    In {S. Wermter, E. Riloff} and G. Scheler, editors, Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, 370-384, Berlin, 1996. Springer.
    This paper presents results on using a new inductive logic programming method called FOIDL to learn the past tense of English verbs. The past tense task has been widely studied in the context of the symbolic/connectionist debate. Previous papers have presented results using various neural-network and decision-tree learning methods. We have developed a technique for learning a special type of Prolog program called a first-order decision list, defined as an ordered list of clauses each ending in a cut. FOIDL is based on FOIL (Quinlan, 1990) but employs intensional background knowledge and avoids the need for explicit negative examples. It is particularly useful for problems that involve rules with specific exceptions, such as the past-tense task. We present results showing that FOIDL learns a more accurate past-tense generator from significantly fewer examples than all other previous methods.
    ML ID: 53
  246. Hybrid Learning of Search Control for Partial-Order Planning
    [Details] [PDF]
    Tara A. Estlin and Raymond J. Mooney
    In Malik Ghallab and Alfredo Milani, editors, New Directions in AI Planning, 129-140, Amsterdam, 1996. IOS Press.
    This paper presents results on applying a version of the DOLPHIN search-control learning system to speed up a partial-order planner. DOLPHIN integrates explanation-based and inductive learning techniques to acquire effective clause-selection rules for Prolog programs. A version of the UCPOP partial-order planning algorithm has been implemented as a Prolog program and DOLPHIN used to automatically learn domain-specific search control rules that help eliminate backtracking. The resulting system is shown to produce significant speedup in several planning domains.
    ML ID: 52
  247. Refinement of Bayesian Networks by Combining Connectionist and Symbolic Techniques
    [Details] [PDF]
    Sowmya Ramachandran
    1995. Unpublished Ph.D. Thesis Proposal.
    Bayesian networks provide a mathematically sound formalism for representing and reasoning with uncertain knowledge and are as such widely used. However, acquiring and capturing knowledge in this framework is difficult. There is a growing interest in formulating techniques for learning Bayesian networks inductively. While the problem of learning a Bayesian network, given complete data, has been explored in some depth, the problem of learning networks with unobserved causes is still open. In this proposal, we view this problem from the perspective of theory revision and present a novel approach which adapts techniques developed for revising theories in symbolic and connectionist representations. Thus, we assume that the learner is given an initial approximate network (usually obtained from a expert). Our technique inductively revises the network to fit the data better. Our proposed system has two components: one component revises the parameters of a Bayesian network of known structure, and the other component revises the structure of the network. The component for parameter revision maps the given Bayesian network into a multi-layer feedforward neural network, with the parameters mapped to weights in the neural network, and uses standard backpropagation techniques to learn the weights. The structure revision component uses qualitative analysis to suggest revisions to the network when it fails to predict the data accurately. The first component has been implemented and we will present results from experiments on real world classification problems which show our technique to be effective. We will also discuss our proposed structure revision algorithm, our plans for experiments to evaluate the system, as well as some extensions to the system.
    ML ID: 51
  248. Inducing Logic Programs without Explicit Negative Examples
    [Details] [PDF]
    John M. Zelle, Cynthia A. Thompson, Mary Elaine Califf, and Raymond J. Mooney
    In Proceedings of the Fifth International Workshop on Inductive Logic Programming (ILP-95), 403-416, Leuven, Belgium, 1995.
    This paper presents a method for learning logic programs without explicit negative examples by exploiting an assumption of output completeness. A mode declaration is supplied for the target predicate and each training input is assumed to be accompanied by all of its legal outputs. Any other outputs generated by an incomplete program implicitly represent negative examples; however, large numbers of ground negative examples never need to be generated. This method has been incorporated into two ILP systems, CHILLIN and IFOIL, both of which use intensional background knowledge. Tests on two natural language acquisition tasks, case-role mapping and past-tense learning, illustrate the advantages of the approach.
    ML ID: 50
  249. Qualitative Multiple-Fault Diagnosis of Continuous Dynamic Systems Using Behavioral Modes
    [Details] [PDF]
    Siddarth Subramanian
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, 1995. 128 pages. Also appears as Technical Report AI 95-239.
    As systems like chemical plants, power plants, and automobiles get more complex, online diagnostic systems are becomingly increasingly important. One of the ways to rein in the complexity of describing and reasoning about large systems such as these is to describe them using qualitative rather than quantitative models.

    Model-based diagnosis is a class of diagnostic techniques that use direct knowledge about how a system functions instead of expert rules detailing causes for every possible set of symptons of a broken system. Our research builds on standard methods for model-based diagnosis and extends them to the domain of complex dynamic systems described using qualitative models.

    We motivate and describe out algorithm for diagnosing faults in a dynamic system given a qualitative model and a sequence of qualitative states. The main contributions in this algorithm include a method for propagating dependencies while solving a general constraint satisfaction problem, and a method for verfying the compatibility of a behavior with a model across time. The algorithm can diagnose multiple faults and uses models of faulty behavior, or behavioral modes.

    We then demonstrate these techniques using an implemented program called QDOCS and test it on some realistic problems. Through our experiments with a model of the reaction control system (RCS) of the space shuttle and with a level-controller for a reaction tank, we show that QDOCS demonstrates the best balance of generality, accuracy and efficiency among known systems.

    ML ID: 49
  250. Using Inductive Logic Programming to Automate the Construction of Natural Language Parsers
    [Details] [PDF]
    John M. Zelle
    PhD Thesis, Department of Computer Sciences, The University of Texas at Austin, Austin, TX, 1995.
    Designing computer systems to understand natural language input is a difficult task. In recent years there has been considerable interest in corpus-based methods for constructing natural language parsers. These empirical approaches replace hand-crafted grammars with linguistic models acquired through automated training over language corpora. A common thread among such methods to date is the use of propositional or probablistic representations for the learned knowledge. This dissertation presents an alternative approach based on techniques from a subfield of machine learning known as inductive logic programming (ILP). ILP, which investigates the learning of relational (first-order) rules, provides an empirical method for acquiring knowledge within traditional, symbolic parsing frameworks.

    This dissertation details the architecture, implementation and evaluation of CHILL a computer system for acquiring natural language parsers by training over corpora of parsed text. CHILL treats language acquisition as the learning of search-control rules within a logic program that implements a shift-reduce parser. Control rules are induced using a novel ILP algorithm which handles difficult issues arising in the induction of search-control heuristics. Both the control-rule framework and the induction algorithm are crucial to CHILL's success.

    The main advantage of CHILL over propositional counterparts is its flexibility in handling varied representations. CHILL has produced parsers for various analyses including case-role mapping, detailed syntactic parse trees, and a logical form suitable for expressing first-order database queries. All of these tasks are accomplished within the same framework, using a single, general learning method that can acquire new syntactic and semantic categories for resolving ambiguities.

    Experimental evidence from both aritificial and real-world corpora demonstrate that CHILL learns parsers as well or better than previous artificial neural network or probablistic approaches on comparable tasks. In the database query domain, which goes beyond the scope of previous empirical approaches, the learned parser outperforms an existing hand-crafted system. These results support the claim that ILP techniques as implemented in CHILL represent a viable alternative with significant potential advantages over neural-network, propositional, and probablistic approaches to empirical parser construction.

    ML ID: 48
  251. A Comparison of Two Methods Employing Inductive Logic Programming for Corpus-based Parser Constuction
    [Details] [PDF]
    John M. Zelle and Raymond J. Mooney
    In Working Notes of the IJCAI-95 Workshop on New Approaches to Learning for Natural Language Processing, 79--86, Montreal, Quebec, Canada, August 1995.
    This paper presents results from recent experiments with CHILL, a corpus-based parser acquisition system. CHILL treats grammar acquisition as the learning of search-control rules within a logic program. Unlike many current corpus-based approaches that use propositional or probabilistic learning algorithms, CHILL uses techniques from inductive logic programming (ILP) to learn relational representations. The reported experiments compare CHILL's performance to that of a more naive application of ILP to parser acquisition. The results show that ILP techniques, as employed in CHILL, are a viable alternative to propositional methods and that the control-rule framework is fundamental to CHILL's success.
    ML ID: 47
  252. Multiple-Fault Diagnosis Using General Qualitative Models with Fault Modes
    [Details] [PDF]
    Siddarth Subramanian and Raymond J. Mooney
    In Working Notes of the IJCAI-95 Workshop on Engneering Problems for Qualitative Reasoning, Monreal, Quebec, August 1995.
    This paper describes an approach to diagnosis of systems described by qualitative differential equations represented as QSIM models. An implemented system QDOCS is described that performs multiple-fault, fault-model based diagnosis, using constraint satisfaction techniques, of qualitative behaviors of systems described by such models. We demonstrate the utility of this system by accurately diagnosing randomly generated faults using simulated behaviors of a portion of the Reaction Control System of the space shuttle.
    ML ID: 46
  253. Acquisition of a Lexicon from Semantic Representations of Sentences
    [Details] [PDF]
    Cynthia A. Thompson
    In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL-95), 335-337, Cambridge, MA, 1995.
    A system, WOLFIE, that acquires a mapping of words to their semantic representation is presented and a preliminary evaluation is performed. Tree least general generalizations (TLGGs) of the representations of input sentences are performed to assist in determining the representations of individual words in the sentences. The best guess for a meaning of a word is the TLGG which overlaps with the highest percentage of sentence representations in which that word appears. Some promising experimental results on a non-artificial data set are presented.
    ML ID: 45
  254. Induction of First-Order Decision Lists: Results on Learning the Past Tense of English Verbs
    [Details] [PDF]
    Raymond J. Mooney and Mary Elaine Califf
    Journal of Artificial Intelligence Research, 3:1-24, 1995.
    This paper presents a method for inducing logic programs from examples that learns a new class of concepts called first-order decision lists, defined as ordered lists of clauses each ending in a cut. The method, called FOIDL, is based on FOIL but employs intensional background knowledge and avoids the need for explicit negative examples. It is particularly useful for problems that involve rules with specific exceptions, such as learning the past-tense of English verbs, a task widely studied in the context of the symbolic/connectionist debate. FOIDL is able to learn concise, accurate programs for this problem from significantly fewer examples than previous methods (both connectionist and symbolic).
    ML ID: 44
  255. Automated Refinement of First-Order Horn-Clause Domain Theories
    [Details] [PDF]
    Bradley L. Richards and Raymond J. Mooney
    Machine Learning, 19(2):95-131, 1995.
    Knowledge acquisition is a difficult and time-consuming task, and as error-prone as any human activity. The task of automatically improving an existing knowledge base using learning methods is addressed by a new class of systems performing theory refinement. Until recently, such systems were limited to propositional theories. This paper presents a system, FORTE (First-Order Revision of Theories from Examples), for refining first-order Horn-clause theories. Moving to a first-order representation opens many new problem areas, such as logic program debugging and qualitative modelling, that are beyond the reach of propositional systems. FORTE uses a hill-climbing approach to revise theories. It identifies possible errors in the theory and calls on a library of operators to develop possible revisions. The best revision is implemented, and the process repeats until no further revisions are possible. Operators are drawn from a variety of sources, including propositional theory refinement, first-order induction, and inverse resolution. FORTE has been tested in several domains including logic programming and qualitative modelling.
    ML ID: 43
  256. Encouraging Experimental Results on Learning CNF
    [Details] [PDF]
    Raymond J. Mooney
    Machine Learning, 19(1):79-92, 1995.
    This paper presents results comparing three inductive learning systems using different representations for concepts, namely: CNF formulae, DNF formulae, and decision trees. The CNF learner performs surprisingly well. Results on five natural data sets show that it frequently trains faster and produces more accurate and simpler concepts. The probable explanation for its superior performance is that the other systems are more susceptible to the replication problem.
    ML ID: 42
  257. A Preliminary PAC Analysis of Theory Revision
    [Details] [PDF]
    Raymond J. Mooney
    In T. Petsche and S. Hanson and Jude W. Shavlik, editors, Computational Learning Theory and Natural Learning Systems, Vol. 3, 43-53, Cambridge, MA, 1995. MIT Press.
    This paper presents a preliminary analysis of the sample complexity of theory revision within the framework of PAC (Probably Approximately Correct) learnability theory. By formalizing the notion that the initial theory is ``close'' to the correct theory we show that the sample complexity of an optimal propositional Horn-clause theory revision algorithm is $O( ( ln 1 / delta + d ln (s_0 + d + n) ) / epsilon)$, where $d$ is the syntactic distance between the initial and correct theories, $s_0$ is the size of initial theory, $n$ is the number of observable features, and $epsilon$ and $delta$ are the standard PAC error and probability bounds. The paper also discusses the problems raised by the computational complexity of theory revision.
    ML ID: 41
  258. Automatic Student Modeling and Bug Library Construction using Theory Refinement
    [Details] [PDF]
    Paul T. Baffes
    PhD Thesis, Department of Computer Sciences, The University of Texas at Austin, Austin, TX, 1994.
    The history of computers in education can be characterized by a continuing effort to construct intelligent tutorial programs which can adapt to the individual needs of a student in a one-on-one setting. A critical component of these intelligent tutorials is a mechanism for modeling the conceptual state of the student so that the system is able to tailor its feedback to suit individual strengths and weaknesses. The primary contribution of this research is a new student modeling technique which can automatically capture novel student errors using only correct domain knowledge, and can automatically compile trends across multiple student models into bug libraries. This approach has been implemented as a computer program, ASSERT, using a machine learning technique called theory refinement which is a method for automatically revising a knowledge base to be consistent with a set of examples. Using a knowledge base that correctly defines a domain and examples of a student's behavior in that domain, ASSERT models student errors by collecting any refinements to the correct knowledge base which are necessary to account for the student's behavior. The efficacy of the approach has been demonstrated by evaluating ASSERT using 100 students tested on a classification task using concepts from an introductory course on the C++ programming language. Students who received feedback based on the models automatically generated by ASSERT performed significantly better on a post test than students who received simple reteaching.
    ML ID: 40
  259. Multiple-Fault Diagnosis Using General Qualitative Models with Fault Modes
    [Details] [PDF]
    Siddarth Subramanian and Raymond J. Mooney
    In Working Papers of the Fifth International Workshop on Principles of Diagnosis, 321-325, New Paltz, NY, October 1994.
    This paper describes an approach to diagnosis of systems described by qualitative differential equations represented as QSIM models. An implemented system QDOCS is described that performs multiple-fault, fault-model based diagnosis, using constraint satisfaction techniques, of qualitative behaviors of systems described by such models. We demonstrate the utility of this system by accurately diagnosing randomly generated faults using simulated behaviors of a portion of the Reaction Control System of the space shuttle.
    ML ID: 39
  260. Inductive Learning For Abductive Diagnosis
    [Details] [PDF]
    Cynthia A. Thompson and Raymond J. Mooney
    In Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), 664-669, Seattle, WA, August 1994.
    A new inductive learning system, LAB (Learning for ABduction), is presented which acquires abductive rules from a set of training examples. The goal is to find a small knowledge base which, when used abductively, diagnoses the training examples correctly and generalizes well to unseen examples. This contrasts with past systems that inductively learn rules that are used deductively. Each training example is associated with potentially multiple categories (disorders), instead of one as with typical learning systems. LAB uses a simple hill-climbing algorithm to efficiently build a rule base for a set-covering abductive system. LAB has been experimentally evaluated and compared to other learning systems and an expert knowledge base in the domain of diagnosing brain damage due to stroke.
    ML ID: 38
  261. Comparing Methods For Refining Certainty Factor Rule-Bases
    [Details] [PDF]
    J. Jeffrey Mahoney and Raymond J. Mooney
    In Proceedings of the Eleventh International Workshop on Machine Learning (ML-94), 173--180, Rutgers, NJ, July 1994.
    This paper compares two methods for refining uncertain knowledge bases using propositional certainty-factor rules. The first method, implemented in the RAPTURE system, employs neural-network training to refine the certainties of existing rules but uses a symbolic technique to add new rules. The second method, based on the one used in the KBANN system, initially adds a complete set of potential new rules with very low certainty and allows neural-network training to filter and adjust these rules. Experimental results indicate that the former method results in significantly faster training and produces much simpler refined rule bases with slightly greater accuracy.
    ML ID: 37
  262. Combining Top-Down And Bottom-Up Techniques In Inductive Logic Programming
    [Details] [PDF]
    John M. Zelle, Raymond J. Mooney, and Joshua B. Konvisser
    In Proceedings of the Eleventh International Workshop on Machine Learning (ML-94), 343--351, Rutgers, NJ, July 1994.
    This paper describes a new method for inducing logic programs from examples which attempts to integrate the best aspects of existing ILP methods into a single coherent framework. In particular, it combines a bottom-up method similar to GOLEM with a top-down method similar to FOIL. It also includes a method for predicate invention similar to CHAMP and an elegant solution to the ``noisy oracle'' problem which allows the system to learn recursive programs without requiring a complete set of positive examples. Systematic experimental comparisons to both GOLEM and FOIL on a range of problems are used to clearly demonstrate the advantages of the approach.
    ML ID: 36
  263. Inducing Deterministic Prolog Parsers From Treebanks: A Machine Learning Approach
    [Details] [PDF]
    John M. Zelle and Raymond J. Mooney
    In Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), 748--753, Seattle, WA, July 1994.
    This paper presents a method for constructing deterministic, context-sensitive, Prolog parsers from corpora of parsed sentences. Our approach uses recent machine learning methods for inducing Prolog rules from examples (inductive logic programming). We discuss several advantages of this method compared to recent statistical methods and present results on learning complete parsers from portions of the ATIS corpus.
    ML ID: 35
  264. Learning Qualitative Models for Systems with Multiple Operating Regions
    [Details] [PDF]
    Sowmya Ramachandran, Raymond J. Mooney, and Benjamin J. Kuipers
    In Proceedings of the Eighth International Workshop on Qualitative Reasoning about Physical Systems, Nara, Japan, 1994.
    The problem of learning qualitative models of physical systems from observations of its behaviour has been addressed by several researchers in recent years. Most current techniques limit themselves to learning a single qualitative differential equation to model the entire system. However, many systems have several qualitative differential equations underlying them. In this paper, we present an approach to learning the models for such systems. Our technique divides the behaviours into segments, each of which can be explained by a single qualitative differential equation. The qualitative model for each segment can be generated using any of the existing techniques for learning a single model. We show that results of applying our technique to several examples and demonstrate that it is effective.
    ML ID: 34
  265. Modifying Network Architectures For Certainty-Factor Rule-Base Revision
    [Details] [PDF]
    J. Jeffrey Mahoney and Raymond J. Mooney
    In Proceedings of the International Symposium on Integrating Knowledge and Neural Heuristics (ISIKNH-94), 75--85, Pensacola, FL, May 1994.
    This paper describes RAPTURE --- a system for revising probabilistic rule bases that converts symbolic rules into a connectionist network, which is then trained via connectionist techniques. It uses a modified version of backpropagation to refine the certainty factors of the rule base, and uses ID3's information-gain heuristic (Quinlan) to add new rules. Work is currently under way for finding improved techniques for modifying network architectures that include adding hidden units using the UPSTART algorithm (Frean). A case is made via comparison with fully connected connectionist techniques for keeping the rule base as close to the original as possible, adding new input units only as needed.
    ML ID: 33
  266. A Multistrategy Approach to Theory Refinement
    [Details] [PDF]
    Raymond J. Mooney and Dirk Ourston
    In Ryszard S. Michalski and G. Teccuci, editors, Machine Learning: A Multistrategy Approach, Vol. IV, 141-164, San Mateo, CA, 1994. Morgan Kaufmann.
    This chapter describes a multistrategy system that employs independent modules for deductive, abductive, and inductive reasoning to revise an arbitrarily incorrect propositional Horn-clause domain theory to fit a set of preclassified training instances. By combining such diverse methods, EITHER is able to handle a wider range of imperfect theories than other theory revision systems while guaranteeing that the revised theory will be consistent with the training data. EITHER has successfully revised two actual expert theories, one in molecular biology and one in plant pathology. The results confirm the hypothesis that using a multistrategy system to learn from both theory and data gives better results than using either theory or data alone.
    ML ID: 32
  267. Theory Refinement Combining Analytical and Empirical Methods
    [Details] [PDF]
    Dirk Ourston and Raymond J. Mooney
    Artificial Intelligence:311-344, 1994.
    This article describes a comprehensive approach to automatic theory revision. Given an imperfect theory, the approach combines explanation attempts for incorrectly classified examples in order to identify the failing portions of the theory. For each theory fault, correlated subsets of the examples are used to inductively generate a correction. Because the corrections are focused, they tend to preserve the structure of the original theory. Because the system starts with an approximate domain theory, in general fewer training examples are required to attain a given level of performance (classification accuracy) compared to a purely empirical system. The approach applies to classification systems employing a propositional Horn-clause theory. The system has been tested in a variety of application domains, and results are presented for problems in the domains of molecular biology and plant disease diagnosis.
    ML ID: 31
  268. Integrating ILP and EBL
    [Details] [PDF]
    Raymond J. Mooney and John M. Zelle
    Sigart Bulletin (special issue on Inductive Logic Programmming), 5(1):12-21, 1994.
    This paper presents a review of recent work that integrates methods from Inductive Logic Programming (ILP) and Explanation-Based Learning (EBL). ILP and EBL methods have complementary strengths and weaknesses and a number of recent projects have effectively combined them into systems with better performance than either of the individual approaches. In particular, integrated systems have been developed for guiding induction with prior knowledge (ML-SMART, FOCL, GRENDEL) refining imperfect domain theories (FORTE, AUDREY, Rx), and learning effective search-control knowledge (AxA-EBL, DOLPHIN).
    ML ID: 30
  269. Extending Theory Refinement to M-of-N Rules
    [Details] [PDF]
    Paul T. Baffes and Raymond J. Mooney
    Informatica, 17:387-397, 1993.
    In recent years, machine learning research has started addressing a problem known as theory refinement. The goal of a theory refinement learner is to modify an incomplete or incorrect rule base, representing a domain theory, to make it consistent with a set of input training examples. This paper presents a major revision of the EITHER propositional theory refinement system. Two issues are discussed. First, we show how run time efficiency can be greatly improved by changing from a exhaustive scheme for computing repairs to an iterative greedy method. Second, we show how to extend EITHER to refine MofN rules. The resulting algorithm, Neither (New EITHER), is more than an order of magnitude faster and produces significantly more accurate results with theories that fit the MofN format. To demonstrate the advantages of NEITHER, we present experimental results from two real-world domains.
    ML ID: 29
  270. Inductive Learning For Abductive Diagnosis
    [Details] [PDF]
    Cynthia A. Thompson
    Masters Thesis, Department of Computer Sciences, The University of Texas at Austin, Austin, TX, August 1993. 53 pages.
    A new system for learning by induction, called LAB, is presented. LAB (Learning for ABduction) learns abductive rules based on a set of training examples. Our goal is to find a small knowledge base which, when used abductively, diagnoses the training examples correctly, in addition to generalizing well to unseen examples. This is in contrast to past systems, which inductively learn rules which are used deductively. Abduction is particularly well suited to diagnosis, in which we are given a set of symptoms (manifestations) and we want our output to be a set of disorders which explain why the manifestations are present. Each training example is associated with potentially multiple categories, instead of one, which is the case with typical learning systems. Building the knowledge base requires a choice between multiple possibilities, and the number of possibilities grows exponentially with the number of training examples. One method of choosing the best knowledge base is described and implemented. The final system is experimentally evaluated, using data from the domain of diagnosing brain damage due to stroke. It is compared to other learning systems and a knowledge base produced by an expert. The results are promising: the rule base learned is simpler than the expert knowledge base and rules learned by one of the other systems, and the accuracy of the learned rule base in predicting which areas are damaged is better than all the other systems as well as the expert knowledge base.
    ML ID: 28
  271. Combining FOIL and EBG to Speed-Up Logic Programs
    [Details] [PDF]
    John M. Zelle and Raymond J. Mooney
    In Proceedings of the 13th International Joint Conference on Artificial Intelligence, 1106-1111, 1993. San Francisco, CA: Morgan Kaufmann.
    This paper presents an algorithm that combines traditional EBL techniques and recent developments in inductive logic programming to learn effective clause selection rules for Prolog programs. When these control rules are incorporated into the original program, significant speed-up may be achieved. The algorithm is shown to be an improvement over competing EBL approaches in several domains. Additionally, the algorithm is capable of automatically transforming some intractable algorithms into ones that run in polynomial time.
    ML ID: 27
  272. Symbolic Revision of Theories With M-of-N Rules
    [Details] [PDF]
    Paul T. Baffes and Raymond J. Mooney
    In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93), 1135-1140, Chambery, France, August 1993.
    This paper presents a major revision of the EITHER propositional theory refinement system. Two issues are discussed. First, we show how run time efficiency can be greatly improved by changing from a exhaustive scheme for computing repairs to an iterative greedy method. Second, we show how to extend EITHER to refine M-of-N rules. The resulting algorithm, NEITHER (New EITHER), is more than an order of magnitude faster and produces significantly more accurate results with theories that fit the M-of-N format. To demonstrate the advantages of NEITHER, we present preliminary experimental results comparing it to EITHER and various other systems on refining the DNA promoter domain theory.
    ML ID: 26
  273. Learning Semantic Grammars With Constructive Inductive Logic Programming
    [Details] [PDF]
    John M. Zelle and Raymond J. Mooney
    In Proceedings of the 11th National Conference on Artificial Intelligence, 817-822, 1993. Menlo Park, CA: AAAI Press.
    Automating the construction of semantic grammars is a difficult and interesting problem for machine learning. This paper shows how the semantic-grammar acquisition problem can be viewed as the learning of search-control heuristics in a logic program. Appropriate control rules are learned using a new first-order induction algorithm that automatically invents useful syntactic and semantic categories. Empirical results show that the learned parsers generalize well to novel sentences and out-perform previous approaches based on connectionist techniques.
    ML ID: 25
  274. Learning to Model Students: Using Theory Refinement to Detect Misconceptions
    [Details] [PDF]
    Paul T. Baffes
    June 1993. Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin.
    A new student modeling system called ASSERT is described which uses domain independent learning algorithms to model unique student errors and to automatically construct bug libraries. ASSERT consists of two learning phases. The first is an application of theory refinement techniques for constructing student models from a correct theory of the domain being tutored. The second learning cycle automatically constructs the bug library by extracting common refinements from multiple student models which are then used to bias future modeling efforts. Initial experimental data will be presented which suggests that ASSERT is a more effective modeling system than other induction techniques previously explored for student modeling, and that the automatic bug library construction significantly enhances subsequent modeling efforts.
    ML ID: 24
  275. Combining Connectionist and Symbolic Learning to Refine Certainty-Factor Rule-Bases
    [Details] [PDF]
    J. Jeffrey Mahoney and Raymond J. Mooney
    Connection Science:339-364, 1993.
    This paper describes Rapture --- a system for revising probabilistic knowledge bases that combines connectionist and symbolic learning methods. Rapture uses a modified version of backpropagation to refine the certainty factors of a Mycin-style rule base and it uses ID3's information gain heuristic to add new rules. Results on refining three actual expert knowledge bases demonstrate that this combined approach generally performs better than previous methods.
    ML ID: 23
  276. Integrating Theory and Data in Category Learning
    [Details] [PDF]
    Raymond J. Mooney
    In G. V. Nakamura and D. L. Medin and R. Taraban, editors, Categorization by Humans and Machines, 189-218, 1993.
    Recent results in both machine learning and cognitive psychology demonstrate that effective category learning involves an integration of theory and data. First, theories can bias induction, affecting what category definitions are extracted from a set of examples. Second, conflicting data can cause theories to be revised. Third, theories can alter the representation of data through feature formation. This chapter reviews two machine learning systems that attempt to integrate theory and data in one or more of these ways. IOU uses a domain theory to acquire part of a concept definition and to focus induction on the unexplained aspects of the data. EITHER uses data to revise an imperfect theory and uses theory to add abstract features to the data. Recent psychological experiments reveal that these machine learning systems exhibit several important aspects of human category learning. Specifically, IOU has been used to successfully model some recent experimental results on the effect of functional knowledge on category learning.
    ML ID: 22
  277. Learning Search-Control Heuristics for Logic Programs: Applications to Speedup Learning and Language Acquisition
    [Details] [PDF]
    John M. Zelle
    March 1993. Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin.
    This paper presents a general framework, learning search-control heuristics for logic programs, which can be used to improve both the efficiency and accuracy of knowledge-based systems expressed as definite-clause logic programs. The approach combines techniques of explanation-based learning and recent advances in inductive logic programming to learn clause-selection heuristics that guide program execution. Two specific applications of this framework are detailed: dynamic optimization of Prolog programs (improving efficiency) and natural language acquisition (improving accuracy). In the area of program optimization, a prototype system, DOLPHIN, is able to transform some intractable specifications into polynomial-time algorithms, and outperforms competing approaches in several benchmark speedup domains. A prototype language acquisition system, CHILL, is also described. It is capable of automatically acquiring semantic grammars, which uniformly incorprate syntactic and semantic constraints to parse sentences into case-role representations. Initial experiments show that this approach is able to construct accurate parsers which generalize well to novel sentences and significantly outperform previous approaches to learning case-role mapping based on connectionist techniques. Planned extensions of the general framework and the specific applications as well as plans for further evaluation are also discussed.
    ML ID: 21
  278. Induction Over the Unexplained: Using Overly-General Domain Theories to Aid Concept Learning
    [Details] [PDF]
    Raymond J. Mooney
    Machine Learning, 10:79-110, 1993.
    This paper describes and evaluates an approach to combining empirical and explanation-based learning called Induction Over the Unexplained (IOU). IOU is intended for learning concepts that can be partially explained by an overly-general domain theory. An eclectic evaluation of the method is presented which includes results from all three major approaches: empirical, theoretical, and psychological. Empirical results shows that IOU is effective at refining overly-general domain theories and that it learns more accurate concepts from fewer examples than a purely empirical approach. The application of theoretical results from PAC learnability theory explains why IOU requires fewer examples. IOU is also shown to be able to model psychological data demonstrating the effect of background knowledge on human learning.
    ML ID: 20
  279. An Operator-Based Approach to First-Order Theory Revision
    [Details] [PDF]
    Bradley Lance Richards
    PhD Thesis, Department of Computer Science, University of Texas at Austin, August 1992.

    Knowledge acquisition is a difficult and time-consuming task, and as error-prone as any human activity. Thus, knowledge bases must be maintained, as errors and omissions are discovered. To address this task, recent learning systems have combined inductive and explanation-based techniques to produce a new class of systems performing theory revision. When errors are discovered in a knowledge base, theory revision allows automatic self-repair, eliminating the need to recall the knowledge engineer and domain expert.

    To date, theory revision systems have been limited to propositional domains. This thesis presents a system, FORTE (First-Order Revision of Theories from Examples), that performs theory revision in first-order domains. Moving to a first-order representation creates many new challenges, such as argument selection and recursion. But is also opens many new application areas, such as logic programming and qualitative modelling, that are beyond the reach of propositional systems.

    ML ID: 278
  280. A General Abductive system with application to plan recognition and diagnosis
    [Details] [PDF]
    Hwee Tou Ng
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, May 1992. 154 pages.
    A diverse set of intelligent activities, including natural language understanding, and scientific theory formation, requires the ability to construct explanations for observed phoenomena. In this thesis, we view explanation as abduction, where an abductive explanation is a consistent set of assumptions which, together with background knowledge, logically entials a set of observations.
    To explore the practical feasibility of such a general abductive approach to explanation, we have successfully built a domain-independent system called ACCEL. In our system, knowledge about a variety of domains in uniformly encoded in first-order Horn-clause axioms. A general-purpose abduction algorithm, AAA, efficiently constructs explanations in the various domians by caching partial explanations to avoid redundant work. Empirical results show that caching of partial explanations can achieve more than an order of magnitude speedup in run time. We have applied our abductive system to two general tasks: plan recognition in text understanding, and diagnosis of medical diseases, logic circuits, and dynamic systems. The results indicate that ACCEL is a general-purpose system capable of plan recognition and diagnosis, yet efficient enough to be of pratical utility.
    In the plan recognition domain, we defined a novel evaluation criterion, called explanatory coherence, and tested ACCEL on 50 short narrative texts. Empirical results demonstrate that coherence is a better evaluation metric than simplicity in the plan recognition domain, and that our system is sufficiently general to be able to handle similar plan recognition problems not known to the system developer in advance.
    In medical diagnosis, we prove that ACCEL computes the same diagnoses as the GSC model of Reggia, and present empirical results demonstrating the efficiency of ACCEL in diagnosing 50 real-world patient cases using a sizable knowledge base with over six hundred symptom-disease rules.
    ACCEL also realizes model-based diagnosis, which concerns inferring faults form first principles given knowledge about the correct structure and behavior of a system. ACCEL has successfully diagnosed logic circuits (a full adder) and dynamic systems (a proportional temperature controller and the water balance system of the human kidney).
    ML ID: 230
  281. Schema acquisition from a single example
    [Details] [PDF]
    W. Ahn, W. F. Brewer and Raymond J. Mooney
    Journal of Experimental Psychology: Learning, Memory, and Cognition, 18:391-412, 1992.
    This study compares similarity-based learning (SBL) and explanation-based learning (EBL) approaches to schema acquisition. In SBL approaches, concept formation is based on similarity across multiple examples. However, these approaches seem to be appropriate when the learner cannot apply existing knowledge and when the concepts to be learned are nonexplanatory. EBL approaches assume that a schema can be acquired from even a single example by constructing an explanation of the example using background knowledge, and generalizing the resulting explanation. However, unlike the current EBL theories, Exp 1 showed significant EBL occurred only when the background information learned during the experiment was actively used by the Ss. Exp 2 showed the generality of EBL mechanisms across a variety of materials and test procedures.
    ML ID: 212
  282. Abductive Plan Recognition and Diagnosis: A Comprehensive Empirical Evaluation
    [Details] [PDF]
    Hwee Tou Ng and Raymond J. Mooney
    In Proceedings of the Third International Conference on Principles of Knowledge Representation and Reasoning, 499--508, Cambridge, MA, October 1992.
    While it has been realized for quite some time within AI that abduction is a general model of explanation for a variety of tasks, there have been no empirical investigations into the practical feasibility of a general, logic-based abductive approach to explanation. In this paper we present extensive empirical results on applying a general abductive system, ACCEL, to moderately complex problems in plan recognition and diagnosis. In plan recognition, ACCEL has been tested on 50 short narrative texts, inferring characters' plans from actions described in a text. In medical diagnosis, ACCEL has diagnosed 50 real-world patient cases involving brain damage due to stroke (previously addressed by set-covering methods). ACCEL also uses abduction to accomplish model-based diagnosis of logic circuits (a full adder) and continuous dynamic systems (a temperature controller and the water balance system of the human kidney). The results indicate that general purpose abduction is an effective and efficient mechanism for solving problems in plan recognition and diagnosis.
    ML ID: 19
  283. Speeding-up Logic Programs by Combining EBG and FOIL
    [Details] [PDF]
    John M. Zelle and Raymond J. Mooney
    In Proceedings of the 1992 Machine Learning Workshop on Knowledge Compilation and Speedup Learning, Aberdeen, Scotland, July 1992.
    This paper presents an algorithm that combines traditional EBL techniques and recent developments in inductive logic programming to learn effective clause selection rules for Prolog programs. When these control rules are incorporated into the original program, significant speed-up may be achieved. The algorithm produces not only EBL-like speed up of problem solvers, but is capable of automatically transforming some intractable algorithms into ones that run in polynomial time.
    ML ID: 18
  284. Automatic Abduction of Qualitative Models
    [Details] [PDF]
    Bradley L. Richards, Ina Kraan, and Benjamin J. Kuipers
    In Proceedings of the Fifth International Workshop on Qualitative Reasoning about Physical Systems, 295-301, 1992.
    We describe a method of automatically abducing qualitative models from descriptions of behaviors. We generate, from either quantitative or qualitative data, models in the form of qualitative differential equations suitable for use by QSIM. Constraints are generated and filtered both by comparison with the input behaviors and by dimensional analysis. If the user provides complete information on the input behaviors and the dimensions of the input variables, the resulting model is unique, maximally constrainted, and guaranteed to reproduce the input behaviors. If the user provides incomplete information, our method will still generate a model which reproduces the input behaviors, but the model may no longer be unique. Incompleteness can take several forms: missing dimensions, values of variables, or entire variables.
    ML ID: 17
  285. Using Theory Revision to Model Students and Acquire Stereotypical Errors
    [Details] [PDF]
    Paul T. Baffes and Raymond J. Mooney
    In Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society, 617-622, Bloomington, IN, 1992.
    Student modeling has been identified as an important component to the long term development of Intelligent Computer-Aided Instruction (ICAI) systems. Two basic approaches have evolved to model student misconceptions. One uses a static, predefined library of user bugs which contains the misconceptions modeled by the system. The other uses induction to learn student misconceptions from scratch. Here, we present a third approach that uses a machine learning technique called theory revision. Using theory revision allows the system to automatically construct a bug library for use in modeling while retaining the flexibility to address novel errors.
    ML ID: 16
  286. Learning Relations by Pathfinding
    [Details] [PDF]
    Bradley L. Richards and Raymond J. Mooney
    In Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI-92), 50-55, San Jose, CA, July 1992.
    First-order learning systems (e.g., FOIL, FOCL, FORTE) generally rely on hill-climbing heuristics in order to avoid the combinatorial explosion inherent in learning first-order concepts. However, hill-climbing leaves these systems vulnerable to local maxima and local plateaus. We present a method, called relational pathfinding, which has proven highly effective in escaping local maxima and crossing local plateaus. We present our algorithm and provide learning results in two domains: family relationships and qualitative model building.
    ML ID: 15
  287. Combining Symbolic and Neural Learning to Revise Probabilistic Theories
    [Details] [PDF]
    J. Jeffrey Mahoney and Raymond J. Mooney
    In Proceedings of the ML92 Workshop on Integrated Learning in Real Domains, Aberdeen, Scotland, July 1992.
    This paper describes RAPTURE --- a system for revising probabilistic theories that combines symbolic and neural-network learning methods. RAPTURE uses a modified version of backpropagation to refine the certainty factors of a Mycin-style rule-base and it uses ID3's information gain heuristic to add new rules. Results on two real-world domains demonstrate that this combined approach performs as well or better than previous methods.
    ML ID: 14
  288. A First-Order Horn-Clause Abductive System and Its Use in Plan Recognition and Diagnosis
    [Details] [PDF]
    Hwee Tou Ng and Raymond J. Mooney
    June 1992. Unpublished Technical Note.
    A diverse set of intelligent activities, including natural language understanding and diagnosis, requires the ability to construct explanations for observed phenomena. In this paper, we view explanation as abduction, where an abductive explanation is a consistent set of assumptions which, together with background knowledge, logically entails a set of observations. We have successfully built a domain-independent system, ACCEL, in which knowledge about a variety of domains is uniformly encoded in first-order Horn-clause axioms. A general-purpose abduction algorithm, AAA, efficiently constructs explanations in the various domains by caching partial explanations to avoid redundant work. Empirical results show that caching of partial explanations can achieve more than an order of magnitude speedup in run time. We have applied our abductive system to two general tasks: plan recognition in text understanding, and diagnosis of medical diseases, logic circuits, and dynamic systems. The results indicate that ACCEL is a general-purpose system capable of plan recognition and diagnosis, yet efficient enough to be of practical utility.
    ML ID: 13
  289. Growing Layers of Perceptrons: Introducing the Extentron Algorithm
    [Details] [PDF]
    Paul T. Baffes and John M. Zelle
    In Proceedings of the 1992 International Joint Conference on Neural Networks, 392--397, Baltimore, MD, June 1992.
    The ideas presented here are based on two observations of perceptrons: (1) when the perceptron learning algorithm cycles among hyperplanes, the hyperplanes may be compared to select one that gives a best SPLIT of the examples, an d (2) it is always possible for the perceptron to build a hyperplane that separates at least one example from all the rest. We describe the Extentron, which grows multi-layer networks capable of distinguishing non-linearly-separable data using the simple perceptron rule for linear threshold units. The resulting algorithm is simple, very fast, scales well to large problems, retains the convergence properties of the perceptron, and can be completely specified using only two parameters. Results are presented comparing the Extentron to other neural network paradigms and to symbolic learning systems.
    ML ID: 12
  290. Automated Debugging of Logic Programs via Theory Revision
    [Details] [PDF]
    Raymond J. Mooney and Bradley L. Richards
    In Proceedings of the Second International Workshop on Inductive Logic Programming (ILP-92), Tokyo, Japan, 1992.
    This paper presents results on using a theory revision system to automatically debug logic programs. FORTE is a recently developed system for revising function-free Horn-clause theories. Given a theory and a set of training examples, it performs a hill-climbing search in an attempt to minimally modify the theory to correctly classify all of the examples. FORTE makes use of methods from propositional theory revision, Horn-clause induction (FOIL), and inverse resolution. The system has has been successfully used to debug logic programs written by undergraduate students for a programming languages course.
    ML ID: 11
  291. Belief Revision in the Context of Abductive Explanation
    [Details] [PDF]
    Siddarth Subramanian
    Technical Report AI92-179, Artificial Intelligence Laboratory, University of Texas, Austin, TX, December 1992.
    This proposal presents an approach to explanation that incorporates the paradigms of belief revision and abduction. We present an algorithm that combines these techniques and a system called BRACE that is a preliminary implementation of this algorithm. We show the applicability of the BRACE approach to a wide range of domains including scientific discovery, device diagnosis and plan recognition. Finally, we describe our proposals for a new implementation, new application domains for our system and extensions to this approach.
    ML ID: 10
  292. Batch versus Incremental Theory Refinement
    [Details] [PDF]
    Raymond J. Mooney
    In Proceedings of the 1992 AAAI Spring Symposium on Knowledge Assimilation, Standford, CA, March 1992.
    Mosts existing theory refinement systems are not incremental. However, any theory refinement system whose input and output theories are compatible can be used to incrementally assimilate data into an evolving theory. This is done by continually feeding its revised theory back in as its input theory. An incremental batch approach, in which the system assimilates a batch of examples at each step, seems most appropriate for existing theory revision systems. Experimental results with the EITHER theory refinement system demonstrate that this approach frequently increases efficiency without significantly decreasing the accuracy or the simplicity of the resulting theory. However, if the system produces bad initial changes to the theory based on only small amount of data, these bad revisions can ``snowball'' and result in an overall decrease in performance.
    ML ID: 9
  293. Using Explanation-Based and Empirical Methods in Theory Revision
    [Details] [PDF]
    Dirk Ourston
    PhD Thesis, Department of Computer Science, University of Texas at Austin, 1991.

    The knowledge acquisition problem is a continuing problem in expert system development. The knowledge base (domain theory) initially formulated by the expert is usually only an approximation to the correct theory for the application domain. This initial knowledge base must be refined (usually manually) as problems are discovered. This research addresses the knowledge base refinement problem for classification tasks. The research provides an automatic method for correcting a domain theory in the light of incorrect performance on a set of training examples. The method uses attempted explanations to focus the correction on the failing part of the knowledge base. It then uses induction to supply a correction to the knoweledge base that will render it consistent with the training examples

    Using this technique, it is possible to correct overly general and overly specific theories, theories with multiple faults at various levels in the theory hierarchy, and theories involving multiple concepts. Methods have been developed for making corrections even in the presence of noisy data. Theoretical justification for the method is given in the form of convergence results that predict that the method will eventually converge to a hypothesis that is within a small error of the correct hypothesis, given sufficient examples. Because the technique currently relies on theorem proving for much of the analysis, it is quite expensive to computationally and heuristic methods for reducing the computational burden have been implemented.

    The system developed as part of the research is called EITHER (Explanation-based Inductive THeory Extension and Revision). EITHER uses propositional Horn clause logic as its knowledge representation, with examples expressed as attribute-value lists .The system has been tested in a variety of domains including revising a theory for the identification of promoters in DNA sequences and a theory for soybean disease diagnosis, where it has been shown to outperform a purely inductive approach.

    ML ID: 277
  294. First-Order Theory Revision
    [Details] [PDF]
    Bradley L. Richards and Raymond J. Mooney
    In Proceedings of the Eighth International Machine Learning Workshop, pp. 447-451, Evanston, IL, June 1991.
    Recent learning systems have combined explanation-based and inductive learning techniques to revise propositional domain theories (e.g. EITHER, RTLS, KBANN). Inductive systems working in first order logic have also been developed (e.g. CIGOL, FOIL, FOCL). This paper presents a theory revision system, Forte, that merges these two developments. Forte provides theory revision capabilities similar to those of the propositional systems, but works with domain theories stated in first-order logic.
    ML ID: 256
  295. Symbolic and Neural Learning Algorithms: An Experimental Comparison
    [Details] [PDF]
    J.W. Shavlik, Raymond J. Mooney and G. Towell
    Machine Learning, 6:111-143, 1991. Reprinted in {it Readings in Knowledge Acquisition and Learning}, Bruce G. Buchanan and David C. Wilkins (eds.), Morgan Kaufman, San Mateo, CA, 1993..
    Despite the fact that many symbolic and neural network (connectionist) learning algorithms address the same problem of learning from classified examples, very little is known regarding their comparative strengths and weaknesses. Experiments comparing the ID3 symbolic learning algorithm with the perception and backpropagation neural learning algorithms have been performed using five large, real-world data sets. Overall, backpropagation performs slightly better than the other two algorithms in terms of classification accuracy on new examples, but takes much longer to train. Experimental results suggest that backpropagation can work significantly better on data sets containing numerical data. Also analyzed empirically are the effects of (1) the amount of training data, (2) imperfect training examples, and (3) the encoding of the desired outputs. Backpropagation occasionally outperforms the other two systems when given relatively small amounts of training data. It is slightly more accurate than ID3 when examples are noisy or incompletely specified. Finally, backpropagation more effectively utilizes a distributed output encoding.
    ML ID: 8
  296. An Efficient First-Order Horn-Clause Abduction System Based on the ATMS
    [Details] [PDF]
    Hwee Tou Ng and Raymond J. Mooney
    In Proceedings of the Ninth National Conference on Artificial Intelligence (AAAI-91), 494-499, Anaheim, CA, July 1991.
    This paper presents an algorithm for first-order Horn-clause abduction that uses an ATMS to avoid redundant computation. This algorithm is either more efficient or more general than any other previous abduction algorithm. Since computing all minimal abductive explanations is intractable, we also present a heuristic version of the algorithm that uses beam search to compute a subset of the simplest explanations. We present empirical results on a broad range of abduction problems from text understanding, plan recognition, and device diagnosis which demonstrate that our algorithm is at least an order of magnitude faster than an alternative abduction algorithm that does not use an ATMS.
    ML ID: 7
  297. Improving Shared Rules in Multiple Category Domain Theories
    [Details] [PDF]
    Dirk Ourston and Raymond J. Mooney
    In Proceedings of the Eighth International Workshop on Machine Learning, 534-538, Evanston, IL, June 1991.
    This paper presents an approach to improving the classification performance of a multiple category theory by correcting intermediate rules which are shared among the categories. Using this technique, the performance of a theory in one category can be improved through training in an entirely different category. Examples of the technique are presented and experimental results are given.
    ML ID: 6
  298. Constructive Induction in Theory Refinement
    [Details] [PDF]
    Raymond J. Mooney and Dirk Ourston
    In Proceedings of the Eighth International Workshop on Machine Learning, 178-182, Evanston, IL, June 1991.
    This paper presents constructive induction techniques recently added to the EITHER theory refinement system. These additions allow EITHER to handle arbitrary gaps at the ``top,'' ``middle,'' and/or ``bottom'' of an incomplete domain theory. Intermediate concept utilization employs existing rules in the theory to derive higher-level features for use in induction. Intermediate concept creation employs inverse resolution to introduce new intermediate concepts in order to fill gaps in a theory that span multiple levels. These revisions allow EITHER to make use of imperfect domain theories in the ways typical of previous work in both constructive induction and theory refinement. As a result, EITHER is able to handle a wider range of theory imperfections than does any other existing theory refinement system.
    ML ID: 5
  299. Theory Refinement with Noisy Data
    [Details] [PDF]
    Raymond J. Mooney and Dirk Ourston
    Technical Report AI91-153, Artificial Intelligence Laboratory, University of Texas, Austin, TX, March 1991.
    This paper presents a method for revising an approximate domain theory based on noisy data. The basic idea is to avoid making changes to the theory that account for only a small amount of data. This method is implemented in the EITHER propositional Horn-clause theory revision system. The paper presents empirical results on artificially corrupted data to show that this method successfully prevents over-fitting. In other words, when the data is noisy, performance on novel test data is considerably better than revising the theory to completely fit the data. When the data is not noisy, noise processing causes no significant degradation in performance. Finally, noise processing increases efficiency and decreases the complexity of the resulting theory.
    ML ID: 4
  300. Changing the Rules: A Comprehensive Approach to Theory Refinement
    [Details] [PDF]
    D. Ourston and Raymond J. Mooney
    In Proceedings of the Eighth National Conference on Artificial Intelligence (AAAI-90), 815-820, Boston, MA, July 1990.
    This paper presents a comprehensive approach to automatic theory refinement. In contrast to other systems, the approach is capable of modifying a theory which contains multiple faults and faults which occur at intermediate points in the theory. The approach uses explanations to focus the corrections to the theory, with the corrections being supplied by an inductive component. In this way, an attempt is made to preserve the structure of the original theory as much as possible. Because the approach begins with an approximate theory, learning an accurate theory takes fewer examples than a purely inductive system. The approach has application in expert system development, where an initial, approximate theory must be refined. The approach also applies at any point in the expert system lifecycle when the expert system generates incorrect results. The approach has been applied to the domain of molecular biology and shows significantly better results then a purely inductive learner.
    ML ID: 3
  301. On the Role of Coherence in Abductive Explanation
    [Details] [PDF]
    Hwee Tou Ng and Raymond J. Mooney
    In Proceedings of the Eighth National Conference on Artificial Intelligence (AAAI-90), 337--342, Boston, MA, July 1990.
    Abduction is an important inference process underlying much of human intelligent activities, including text understanding, plan recognition, disease diagnosis, and physical device diagnosis. In this paper, we describe some problems encountered using abduction to understand text, and present some solutions to overcome these problems. The solutions we propose center around the use of a different criterion, called explanatory coherence, as the primary measure to evaluate the quality of an explanation. In addition, explanatory coherence plays an important role in the construction of explanations, both in determining the appropriate level of specificity of a preferred explanation, and in guiding the heuristic search to efficiently compute explanations of sufficiently high quality.
    ML ID: 2
  302. Learning Plan Schemata From Observation: Explanation-Based Learning for Plan Recognition
    [Details] [PDF]
    Raymond J. Mooney
    Cognitive Science, 14(4):483-509, 1990.
    This article discusses how explanation-based learning of plan schemata from observation can improve performance of plan recognition. The GENESIS program is presented as an implemented system for narrative text understanding that learns schemata and improves its performance. Learned schemata allow GENESIS to use schema-based understanding techniques when interpreting events and thereby avoid the expensive search associated with plan-based understanding. Learned schemata also function as new concepts that can be used to cluster examples and index events in memory. In addition. experiments are reviewed which demonstrate that human subjects, like GENESIS, can learn a schema by observing, explaining, and generalizing a single specific instance presented in a narrative.
    ML ID: 1
  303. Processing Issues in Comparisons of Symbolic and Connectionist Learning Systems
    [Details] [PDF]
    Douglas Fisher and Kathleen McKusick and Raymond J. Mooney and Jude W. Shavlik and Geoffrey Towell
    In Proceedings of the Sixth International Workshop on Machine Learning, 169--173, Ithaca, New York, 1989.
    Symbolic and connectionist learning strategies are receiving much attention. Comparative studies should qualify the advantages of systems from each paradigm. However, these systems make differing assumptions along several dimensions, thus complicating the design of 'fair' experimental comparisons. This paper describes our comparative studies of ID3 and back-propagation and suggests experimental dimensions that may be useful in cross-paradigm experimental design.
    ML ID: 275
  304. An Experimental Comparison of Symbolic and Connectionist Learning Algorithms
    [Details] [PDF]
    Raymond J. Mooney, J.W. Shavlik, G. Towell and A. Gove
    In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (IJCAI-89), 775-780, Detroit, MI, August 1989. Reprinted in ``Readings in Machine Learning'', Jude W. Shavlik and T. G. Dietterich (eds.), Morgan Kaufman, San Mateo, CA, 1990..
    Despite the fact that many symbolic and connectionist (neural net) learning algorithms are addressing the same problem of learning from classified examples, very little is known regarding their comparative strengths and weaknesses. This paper presents the results of experiments comparing the ID3 symbolic learning algorithm with the perceptron and back-propagation connectionist learning algorithms on several large real-world data sets. The results show that ID3 and perceptron run significantly faster than does back-propagation, both during learning and during classification of novel examples. However, the probability of correctly classifying new examples is about the same for the three systems. On noisy data sets there is some indication that back-propagation classifies more accurately.
    ML ID: 211
  305. The Effect of Rule Use on the Utility of Explanation-Based Learning
    [Details] [PDF]
    Raymond J. Mooney
    In Proceedings of the 11th International Joint Conference on Artificial Intelligence, 725-730, 1989. San Francisco, CA: Morgan Kaufmann.
    The utility problem in explanation-based learning concerns the ability of learned rules or plans to actually improve the performance of a problem solving system. Previous research on this problem has focused on the amount, content, or form of learned information. This paper examines the effect of the use of learned information on performance. Experiments and informal analysis show that unconstrained use of learned rules eventually leads to degraded performance. However, constraining the use of learned rules helps avoid the negative effect of learning and lead to overall performance improvement. Search strategy is also shown to have a substantial effect on the contribution of learning to performance by affecting the manner in which learned rules are used. These effects help explain why previous experiments have obtained a variety of different results concerning the impact of explanation-based learning on performance.
    ML ID: 210
  306. Generalizing the Order of Operators in Macro-Operators
    [Details] [PDF]
    Raymond J. Mooney
    In Proceedings of the Fifth International Conference on Machine Learning (ICML-88), 270-283, Ann Arbor, MI, June 1988.
    A number of machine learning systems have been built which learn macro-operators or plan schemata, i.e. general compositions of actions which achieve a goal. However, previous research has not addressed the issue of generalizing the temporal order of operators and learning macro-operators with partially-ordered actions. This paper presents an algorithm for learning partially-ordered macro-operators which has been incorporated into the EGGS domain-independent explanation-based learning system. Examples from the domains of computer programming and narrative understanding are used to illustrate the performance of this system. These examples demonstrate that generalizing the order of operators can result in more general as well as more justified concepts. A theoretical analysis of the time complexity of the generalization algorithm is also presented.
    ML ID: 209
  307. A General Explanation-Based Learning Mechanism and its Application to Narrative Understanding
    [Details] [PDF]
    Raymond J. Mooney
    Ph.D. thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, 1988
    Explanation-based learning (EBL) is a learning method which uses existing knowledge of the domain to construct an explanation for why a specific example is a member of a concept or why a specific combination of actions achieves a goal. This explanation is then generalized in an analytical manner in order to produce a general concept description or plan schema. Although a number of exploratory EBL systems which operate in particular domains have previously been constructed, recent research in this area has lead to the development of general mechanisms which can perform explanation-based learning in a wide variety of domains.

    This thesis describes a general EBL mechanism, EGGS, which can make use of declarative knowledge stored in the form of Horn clauses, rewrite rules, or STRIPS operators. Numerous examples are presented illustrating its application to a wide variety of domains, including "blocks world" planning, logic circuit design, artifact recognition, and various forms of mathematical problem solving. The system is shown to improve its performance in each of these domains.

    EGGS has been most thoroughly tested as a component of a narrative understanding system, GENESIS, which improves its own performance through learning. GENESIS processes short English narratives and constructs explanations for characters' intentional behavior. When the system detects that a character has achieved an important goal by combining actions in an unfamiliar way, EGGS is used to generalize the specific explanation for how the goal was achieved into a general plan schema. The resulting schema is then retained by the system and indexed into its existing knowledge-base. This schema can then be used to process narratives which were previously beyond the system's capabilities. The thesis also discusses GENESIS' ability to learn meanings for words related to its learned schemata and reviews several recent psychological experiments which demonstrate that GENESIS can be productively interpreted as a cognitive model of certain types of human learning.

  308. Integrated Learning of Words and their Underlying Concepts
    [Details] [PDF]
    Raymond J. Mooney
    In Proceedings of the Ninth Annual Conference of the Cognitive Science Society, 947-978, Seattle, WA, July 1987.
    Models of learning word meanings have generally assumed prior knowledge of the concepts to which the words refer. However, novel natural language text or discourse often presents both unknown concepts and words which refer to these concepts. Also, developmental data suggests that the learning of words and their concepts frequently occurs concurrently instead of concept learning proceeding word learning. This paper presents an integrated computational model for acquiring both word meanings and their underlying concepts concurrently. This model is implemented as a word learning component added to the GENESIS explanation-based learning schema acquisition system for narrative understanding. A detailed example is described in which GENESIS learns provisional definitions for the words "kidnap", "kidnapper", and "ransom" as well as a kidnapping schema from a single narrative.
    ML ID: 208
  309. Schema Acquisition from One Example: Psychological Evidence for Explanation-Based Learning
    [Details] [PDF]
    W. Ahn, Raymond J. Mooney, W.F. Brewer and G.F. DeJong
    In Proceedings of the Ninth Annual Conference of the Cognitive Science Society, 50-57, Seattle, WA, July 1987.
    Recent explanation-based learning (EBL) models in AI allow a computer program to learn a schema by analyzing a single example. For example, GENESIS is an EBL system which learns a plan schema from a single specific instance presented in a narrative. Previous learning models in both AI and psychology have required multiple examples. This paper presents experimental evidence that people can learn a plan schema from a single narrative and that the learned schema agrees with that predicted by EBL. This evidence suggests that GENESIS, originally constructed as a machine learning system, can be interpreted as a psychological model of learning a complex schema from a single example.
    ML ID: 207
  310. A Domain Independent Explanation-Based Generalizer
    [Details] [PDF]
    Raymond J. Mooney and S.W. Bennett
    In Proceedings of the Fifth National Conference on Artificial Intelligence (AAAI-86), 551-555, Philadelphia, PA, August 1986.
    A domain independent technique for generalizing a broad class of explanations is described. This method is compared and contrasted with other approaches to generalizing explanations, including an abstract version of the algorithm used in the STRIPS system and the EBG technique developed by Mitchell, Keller, and Kedar-Cabelli. We have tested this generalization technique on a number of examples in different domains, and present detailed descriptions of several of these.
    ML ID: 206
  311. Explanation-Based Learning: An Alternative View
    [Details] [PDF]
    G.F. DeJong and Raymond J. Mooney
    Machine Learning:145-176, 1986.
    In the last issue of this journal Mitchell, Keller, and Kedar-Cabelli presented a unifying framework for the explanation-based approach to machine learning. While it works well for a number of systems, the framework does not adequately capture certain aspects of the systems under development by the explanation-based learning group at Illinois. The primary inadequacies arise in the treatment of concept operationality, organization of knowledge into schemata, and learning from observation. This paper outlines six specific problems with the previously proposed framework and presents an alternative generalization method to perform explanation-based learning of new concepts.
  312. Generalizing Explanations of Narratives into Schemata
    [Details] [PDF]
    Raymond J. Mooney
    In Proceedings of the Third International Machine Learning Workshop, 126--128, New Brunswick, New Jersey, 1985.
    This paper describes a natural language system which improves its performance through learning. The system processes short English narratives and from a single narrative acquires a new schema for a stereotypical set of actions. During the understanding process, the system constructs explanations for characters' actions in terms of the goals they were meant to achieve. If a character achieves a common goal in a novel way, it generalizes the set of actions used to achieve this goal into a new schema. The generalization process is a knowledge-based analysis of the narrative's causal structure which removes unnecessary details while maintaining the validity of the explanation. The resulting generalized set of actions is then stored as a new schema and used by the system to process narratives which were previously beyond its capabilities.
    ML ID: 276
  313. Learning Schemata for Natural Language Processing
    [Details] [PDF]
    Raymond J. Mooney and Gerald F. DeJong
    In Proceedings of the Ninth International Joint Conference on Artificial Intelligence (IJCAI-85), 681-687, Los Angeles, CA, August 1985.
    This paper describes a natural language system which improves its own performance through learning. The system processes short English narratives and is able to acquire, from a single narrative, a new schema for a stereotypical set of actions. During the understanding process, the system attempts to construct explanations for characters' actions in terms of the goals their actions were meant to achieve. When the system observes that a character has achieved an interesting goal in a novel way, it generalizes the set of actions they used to achieve this goal into a new schema. The generalization process is a knowledge-based analysis of the causal structure of the narrative which removes unnecessary details while maintaining the validity of the causal explanation. The resulting generalized set of actions is then stored as a new schema and used by the system to correctly process narratives which were previously beyond its capabilities.
    ML ID: 205
  314. Generalizing Explanations of Narratives into Schemata
    [Details] [PDF]
    Raymond J. Mooney
    Masters Thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, 1985.
    This thesis describes a natural language system called GENESIS which improves its own performance through learning. The system processes short English narratives and is able to acquire, from a single narrative, a new schema for a stereotypical set of actions. During the understanding process, the system attempts to construct explanations for characters' actions in terms of the goals their actions were meant to achieve. When the system observes that a character in a narrative has achieved an interesting goal in a novel way, it generalizes the set of actions they used to achieve this goal into a new schema. The generalization process is a knowledge-based analysis of the causal structure of the narrative which removes unnecessary details while maintaining the validity of the causal explanation. The resulting generalized combination of actions is then stored as a new schema in the system's knowledge base. This new schema can then be used by the system to correctly process narratives which were previously beyond its capabilities.