Machine Learning Research Group | University of Texas

Publications: Explanation-Based Learning

Most machine learning is focused on inductive generalization from empirical data and does not explicitly exploit prior knowledge of the domain. Explanation-based learning is a radically different approach that uses existing declarative domain knowledge to "explain" individual examples and uses this explanation to drive a knowledge-based generalization of the example. It is therefore capable of learning a very general concept from only a single training example. Our work was some of the original research on this approach and lead to our subsequent work on theory refinement and on learning for planning and problem-solving.

Hide abstracts

TellMeWhy: A Dataset for Answering Why-Questions in Narratives
[Details] [PDF] [Slides (PDF)] [Video]
Yash Kumar Lal, Nathanael Chambers, Raymond Mooney, Niranjan Balasubramanian
In Findings of ACL 2021, August 2021.
Answering questions about why characters perform certain actions is central to understanding and reasoning about narratives. Despite recent progress in QA, it is not clear if existing models have the ability to answer “why” questions that may require common-sense knowledge external to the input narrative. In this work, we introduceTellMeWhy, a new crowd-sourced dataset that consists of more than 30k questions and free-form answers concerning why characters in short narratives perform the actions described. For a third of this dataset, the answers are not present within the narrative. Given the limitations of automated evaluation for this task, we also present a systematized human evaluation interface for this dataset. Our evaluation of state-of-the-art models shows that they are far below human performance on answering such questions. They are especially worse on questions whose answers are external to the narrative, thus providing a challenge for future QAand narrative understanding research.
ML ID: 406
Incorporating Textual Resources to Improve Visual Question Answering
[Details] [PDF] [Slides (PDF)]
Jialin Wu
September 2021. Ph.D. Proposal.
Recently, visual question answering (VQA) emerged as a challenge multi-modal task and gained in popularity. The goal is to answer questions that query information associated with the visual content in the given image. Since the required information could be from both inside and outside the image, common types of visual features, such as object and attribute detection, fail to provide enough materials for answering the questions. Textual resources, such as captions, explanations, encyclopedia articles, can help VQA systems comprehensively understand the image, reason following the right path, and access external facts. Specifically, they provide concise descriptions of the image, precise reasons for the correct answer, and factual knowledge beyond the image. We presented completed work on generating image captions that are targeted to help answer a specific visual question. We introduced an approach that generates textual explanations and used these explanations to determine which answer is mostly supported. We used explanations to recognize the critical objects for solving the visual question and trained the VQA systems to be influenced by these objects most. We also explored using textual resources to provide external knowledge beyond the visual content that is indispensable for a recent trend towards knowledge-based VQA. We further propose to break down visual questions such that each segment, which carries a single piece of semantic content in the question, can be associated with its specific knowledge. This separation aims to help the VQA system understand the question structure to satisfy the need for linking different aspects of the question to different types of information within and beyond the image.
ML ID: 397
Stacking With Auxiliary Features for Visual Question Answering
[Details] [PDF] [Poster]
Nazneen Fatema Rajani, Raymond J. Mooney
In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2217-2226, 2018.
Visual Question Answering (VQA) is a well-known and challenging task that requires systems to jointly reason about natural language and vision. Deep learning models in various forms have been the standard for solving VQA. However, some of these VQA models are better at certain types of image-question pairs than other models. Ensembling VQA models intelligently to leverage their diverse expertise is, therefore, advantageous. Stacking With Auxiliary Features (SWAF) is an intelligent ensembling technique which learns to combine the results of multiple models using features of the current problem as context. We propose four categories of auxiliary features for ensembling for VQA. Three out of the four categories of features can be inferred from an image-question pair and do not require querying the component models. The fourth category of auxiliary features uses model-specific explanations. In this paper, we describe how we use these various categories of auxiliary features to improve performance for VQA. Using SWAF to effectively ensemble three recent systems, we obtain a new state-of-the-art. Our work also highlights the advantages of explainable AI models.
ML ID: 360
Bayesian Logic Programs for Plan Recognition and Machine Reading
[Details] [PDF] [Slides (PPT)]
Sindhu Raghavan
PhD Thesis, Department of Computer Science, University of Texas at Austin, December 2012. 170.
Several real world tasks involve data that is uncertain and relational in nature. Traditional approaches like first-order logic and probabilistic models either deal with structured data or uncertainty, but not both. To address these limitations, statistical relational learning (SRL), a new area in machine learning integrating both first-order logic and probabilistic graphical models, has emerged in the recent past. The advantage of SRL models is that they can handle both uncertainty and structured/relational data. As a result, they are widely used in domains like social network analysis, biological data analysis, and natural language processing. Bayesian Logic Programs (BLPs), which integrate both first-order logic and Bayesian networks are a powerful SRL formalism developed in the recent past. In this dissertation, we develop approaches using BLPs to solve two real world tasks -- plan recognition and machine reading.
Plan recognition is the task of predicting an agent's top-level plans based on its observed actions. It is an abductive reasoning task that involves inferring cause from effect. In the first part of the dissertation, we develop an approach to abductive plan recognition using BLPs. Since BLPs employ logical deduction to construct the networks, they cannot be used effectively for abductive plan recognition as is. Therefore, we extend BLPs to use logical abduction to construct Bayesian networks and call the resulting model Bayesian Abductive Logic Programs (BALPs).
In the second part of the dissertation, we apply BLPs to the task of machine reading, which involves automatic extraction of knowledge from natural language text. Most information extraction (IE) systems identify facts that are explicitly stated in text. However, much of the information conveyed in text must be inferred from what is explicitly stated since easily inferable facts are rarely mentioned. Human readers naturally use common sense knowledge and "read between the lines" to infer such implicit information from the explicitly stated facts. Since IE systems do not have access to common sense knowledge, they cannot perform deeper reasoning to infer implicitly stated facts. Here, we first develop an approach using BLPs to infer implicitly stated facts from natural language text. It involves learning uncertain common sense knowledge in the form of probabilistic first-order rules by mining a large corpus of automatically extracted facts using an existing rule learner. These rules are then used to derive additional facts from extracted information using BLP inference. We then develop an online rule learner that handles the concise, incomplete nature of natural-language text and learns first-order rules from noisy IE extractions. Finally, we develop a novel approach to calculate the weights of the rules using a curated lexical ontology like WordNet.
Both tasks described above involve inference and learning from partially observed or incomplete data. In plan recognition, the underlying cause or the top-level plan that resulted in the observed actions is not known or observed. Further, only a subset of the executed actions can be observed by the plan recognition system resulting in partially observed data. Similarly, in machine reading, since some information is implicitly stated, they are rarely observed in the data. In this dissertation, we demonstrate the efficacy of BLPs for inference and learning from incomplete data. Experimental comparison on various benchmark data sets on both tasks demonstrate the superior performance of BLPs over state-of-the-art methods.
ML ID: 280
Integrating EBL and ILP to Acquire Control Rules for Planning
[Details] [PDF]
Tara A. Estlin and Raymond J. Mooney
In Proceedings of the Third International Workshop on Multi-Strategy Learning (MSL-96), 271--279, Harpers Ferry, WV, May 1996.
Most approaches to learning control information in planning systems use explanation-based learning to generate control rules. Unfortunately, EBL alone often produces overly complex rules that actually decrease planning efficiency. This paper presents a novel learning approach for control knowledge acquisition that integrates explanation-based learning with techniques from inductive logic programming. EBL is used to constrain an inductive search for selection heuristics that help a planner choose between competing plan refinements. SCOPE is one of the few systems to address learning control information in the newer partial-order planners. Specifically, SCOPE learns domain-specific control rules for a version of the UCPOP planning algorithm. The resulting system is shown to produce significant speedup in two different planning domains.
ML ID: 60
Integrating ILP and EBL
[Details] [PDF]
Raymond J. Mooney and John M. Zelle
Sigart Bulletin (special issue on Inductive Logic Programmming), 5(1):12-21, 1994.
This paper presents a review of recent work that integrates methods from Inductive Logic Programming (ILP) and Explanation-Based Learning (EBL). ILP and EBL methods have complementary strengths and weaknesses and a number of recent projects have effectively combined them into systems with better performance than either of the individual approaches. In particular, integrated systems have been developed for guiding induction with prior knowledge (ML-SMART, FOCL, GRENDEL) refining imperfect domain theories (FORTE, AUDREY, Rx), and learning effective search-control knowledge (AxA-EBL, DOLPHIN).
ML ID: 30
Combining FOIL and EBG to Speed-Up Logic Programs
[Details] [PDF]
John M. Zelle and Raymond J. Mooney
In Proceedings of the 13th International Joint Conference on Artificial Intelligence, 1106-1111, 1993. San Francisco, CA: Morgan Kaufmann.
This paper presents an algorithm that combines traditional EBL techniques and recent developments in inductive logic programming to learn effective clause selection rules for Prolog programs. When these control rules are incorporated into the original program, significant speed-up may be achieved. The algorithm is shown to be an improvement over competing EBL approaches in several domains. Additionally, the algorithm is capable of automatically transforming some intractable algorithms into ones that run in polynomial time.
ML ID: 27
Integrating Theory and Data in Category Learning
[Details] [PDF]
Raymond J. Mooney
In G. V. Nakamura and D. L. Medin and R. Taraban, editors, Categorization by Humans and Machines, 189-218, 1993.
Recent results in both machine learning and cognitive psychology demonstrate that effective category learning involves an integration of theory and data. First, theories can bias induction, affecting what category definitions are extracted from a set of examples. Second, conflicting data can cause theories to be revised. Third, theories can alter the representation of data through feature formation. This chapter reviews two machine learning systems that attempt to integrate theory and data in one or more of these ways. IOU uses a domain theory to acquire part of a concept definition and to focus induction on the unexplained aspects of the data. EITHER uses data to revise an imperfect theory and uses theory to add abstract features to the data. Recent psychological experiments reveal that these machine learning systems exhibit several important aspects of human category learning. Specifically, IOU has been used to successfully model some recent experimental results on the effect of functional knowledge on category learning.
ML ID: 22
Induction Over the Unexplained: Using Overly-General Domain Theories to Aid Concept Learning
[Details] [PDF]
Raymond J. Mooney
Machine Learning, 10:79-110, 1993.
This paper describes and evaluates an approach to combining empirical and explanation-based learning called Induction Over the Unexplained (IOU). IOU is intended for learning concepts that can be partially explained by an overly-general domain theory. An eclectic evaluation of the method is presented which includes results from all three major approaches: empirical, theoretical, and psychological. Empirical results shows that IOU is effective at refining overly-general domain theories and that it learns more accurate concepts from fewer examples than a purely empirical approach. The application of theoretical results from PAC learnability theory explains why IOU requires fewer examples. IOU is also shown to be able to model psychological data demonstrating the effect of background knowledge on human learning.
ML ID: 20
Schema acquisition from a single example
[Details] [PDF]
W. Ahn, W. F. Brewer and Raymond J. Mooney
Journal of Experimental Psychology: Learning, Memory, and Cognition, 18:391-412, 1992.
This study compares similarity-based learning (SBL) and explanation-based learning (EBL) approaches to schema acquisition. In SBL approaches, concept formation is based on similarity across multiple examples. However, these approaches seem to be appropriate when the learner cannot apply existing knowledge and when the concepts to be learned are nonexplanatory. EBL approaches assume that a schema can be acquired from even a single example by constructing an explanation of the example using background knowledge, and generalizing the resulting explanation. However, unlike the current EBL theories, Exp 1 showed significant EBL occurred only when the background information learned during the experiment was actively used by the Ss. Exp 2 showed the generality of EBL mechanisms across a variety of materials and test procedures.
ML ID: 212
Speeding-up Logic Programs by Combining EBG and FOIL
[Details] [PDF]
John M. Zelle and Raymond J. Mooney
In Proceedings of the 1992 Machine Learning Workshop on Knowledge Compilation and Speedup Learning, Aberdeen, Scotland, July 1992.
This paper presents an algorithm that combines traditional EBL techniques and recent developments in inductive logic programming to learn effective clause selection rules for Prolog programs. When these control rules are incorporated into the original program, significant speed-up may be achieved. The algorithm produces not only EBL-like speed up of problem solvers, but is capable of automatically transforming some intractable algorithms into ones that run in polynomial time.
ML ID: 18
Learning Plan Schemata From Observation: Explanation-Based Learning for Plan Recognition
[Details] [PDF]
Raymond J. Mooney
Cognitive Science, 14(4):483-509, 1990.
This article discusses how explanation-based learning of plan schemata from observation can improve performance of plan recognition. The GENESIS program is presented as an implemented system for narrative text understanding that learns schemata and improves its performance. Learned schemata allow GENESIS to use schema-based understanding techniques when interpreting events and thereby avoid the expensive search associated with plan-based understanding. Learned schemata also function as new concepts that can be used to cluster examples and index events in memory. In addition. experiments are reviewed which demonstrate that human subjects, like GENESIS, can learn a schema by observing, explaining, and generalizing a single specific instance presented in a narrative.
ML ID: 1
The Effect of Rule Use on the Utility of Explanation-Based Learning
[Details] [PDF]
Raymond J. Mooney
In Proceedings of the 11th International Joint Conference on Artificial Intelligence, 725-730, 1989. San Francisco, CA: Morgan Kaufmann.
The utility problem in explanation-based learning concerns the ability of learned rules or plans to actually improve the performance of a problem solving system. Previous research on this problem has focused on the amount, content, or form of learned information. This paper examines the effect of the use of learned information on performance. Experiments and informal analysis show that unconstrained use of learned rules eventually leads to degraded performance. However, constraining the use of learned rules helps avoid the negative effect of learning and lead to overall performance improvement. Search strategy is also shown to have a substantial effect on the contribution of learning to performance by affecting the manner in which learned rules are used. These effects help explain why previous experiments have obtained a variety of different results concerning the impact of explanation-based learning on performance.
ML ID: 210

Generalizing the Order of Operators in Macro-Operators
[Details] [PDF]
Raymond J. Mooney
In Proceedings of the Fifth International Conference on Machine Learning (ICML-88), 270-283, Ann Arbor, MI, June 1988.
A number of machine learning systems have been built which learn macro-operators or plan schemata, i.e. general compositions of actions which achieve a goal. However, previous research has not addressed the issue of generalizing the temporal order of operators and learning macro-operators with partially-ordered actions. This paper presents an algorithm for learning partially-ordered macro-operators which has been incorporated into the EGGS domain-independent explanation-based learning system. Examples from the domains of computer programming and narrative understanding are used to illustrate the performance of this system. These examples demonstrate that generalizing the order of operators can result in more general as well as more justified concepts. A theoretical analysis of the time complexity of the generalization algorithm is also presented.
ML ID: 209

A General Explanation-Based Learning Mechanism and its Application to Narrative Understanding
[Details] [PDF]
Raymond J. Mooney
Ph.D. thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, 1988
Explanation-based learning (EBL) is a learning method which uses existing knowledge of the domain to construct an explanation for why a specific example is a member of a concept or why a specific combination of actions achieves a goal. This explanation is then generalized in an analytical manner in order to produce a general concept description or plan schema. Although a number of exploratory EBL systems which operate in particular domains have previously been constructed, recent research in this area has lead to the development of general mechanisms which can perform explanation-based learning in a wide variety of domains.
This thesis describes a general EBL mechanism, EGGS, which can make use of declarative knowledge stored in the form of Horn clauses, rewrite rules, or STRIPS operators. Numerous examples are presented illustrating its application to a wide variety of domains, including "blocks world" planning, logic circuit design, artifact recognition, and various forms of mathematical problem solving. The system is shown to improve its performance in each of these domains.
EGGS has been most thoroughly tested as a component of a narrative understanding system, GENESIS, which improves its own performance through learning. GENESIS processes short English narratives and constructs explanations for characters' intentional behavior. When the system detects that a character has achieved an important goal by combining actions in an unfamiliar way, EGGS is used to generalize the specific explanation for how the goal was achieved into a general plan schema. The resulting schema is then retained by the system and indexed into its existing knowledge-base. This schema can then be used to process narratives which were previously beyond the system's capabilities. The thesis also discusses GENESIS' ability to learn meanings for words related to its learned schemata and reviews several recent psychological experiments which demonstrate that GENESIS can be productively interpreted as a cognitive model of certain types of human learning.

Integrated Learning of Words and their Underlying Concepts
[Details] [PDF]
Raymond J. Mooney
In Proceedings of the Ninth Annual Conference of the Cognitive Science Society, 947-978, Seattle, WA, July 1987.
Models of learning word meanings have generally assumed prior knowledge of the concepts to which the words refer. However, novel natural language text or discourse often presents both unknown concepts and words which refer to these concepts. Also, developmental data suggests that the learning of words and their concepts frequently occurs concurrently instead of concept learning proceeding word learning. This paper presents an integrated computational model for acquiring both word meanings and their underlying concepts concurrently. This model is implemented as a word learning component added to the GENESIS explanation-based learning schema acquisition system for narrative understanding. A detailed example is described in which GENESIS learns provisional definitions for the words "kidnap", "kidnapper", and "ransom" as well as a kidnapping schema from a single narrative.
ML ID: 208

Schema Acquisition from One Example: Psychological Evidence for Explanation-Based Learning
[Details] [PDF]
W. Ahn, Raymond J. Mooney, W.F. Brewer and G.F. DeJong
In Proceedings of the Ninth Annual Conference of the Cognitive Science Society, 50-57, Seattle, WA, July 1987.
Recent explanation-based learning (EBL) models in AI allow a computer program to learn a schema by analyzing a single example. For example, GENESIS is an EBL system which learns a plan schema from a single specific instance presented in a narrative. Previous learning models in both AI and psychology have required multiple examples. This paper presents experimental evidence that people can learn a plan schema from a single narrative and that the learned schema agrees with that predicted by EBL. This evidence suggests that GENESIS, originally constructed as a machine learning system, can be interpreted as a psychological model of learning a complex schema from a single example.
ML ID: 207

A Domain Independent Explanation-Based Generalizer
[Details] [PDF]
Raymond J. Mooney and S.W. Bennett
In Proceedings of the Fifth National Conference on Artificial Intelligence (AAAI-86), 551-555, Philadelphia, PA, August 1986.
A domain independent technique for generalizing a broad class of explanations is described. This method is compared and contrasted with other approaches to generalizing explanations, including an abstract version of the algorithm used in the STRIPS system and the EBG technique developed by Mitchell, Keller, and Kedar-Cabelli. We have tested this generalization technique on a number of examples in different domains, and present detailed descriptions of several of these.
ML ID: 206

Explanation-Based Learning: An Alternative View
[Details] [PDF]
G.F. DeJong and Raymond J. Mooney
Machine Learning:145-176, 1986.
In the last issue of this journal Mitchell, Keller, and Kedar-Cabelli presented a unifying framework for the explanation-based approach to machine learning. While it works well for a number of systems, the framework does not adequately capture certain aspects of the systems under development by the explanation-based learning group at Illinois. The primary inadequacies arise in the treatment of concept operationality, organization of knowledge into schemata, and learning from observation. This paper outlines six specific problems with the previously proposed framework and presents an alternative generalization method to perform explanation-based learning of new concepts.

Learning Schemata for Natural Language Processing
[Details] [PDF]
Raymond J. Mooney and Gerald F. DeJong
In Proceedings of the Ninth International Joint Conference on Artificial Intelligence (IJCAI-85), 681-687, Los Angeles, CA, August 1985.
This paper describes a natural language system which improves its own performance through learning. The system processes short English narratives and is able to acquire, from a single narrative, a new schema for a stereotypical set of actions. During the understanding process, the system attempts to construct explanations for characters' actions in terms of the goals their actions were meant to achieve. When the system observes that a character has achieved an interesting goal in a novel way, it generalizes the set of actions they used to achieve this goal into a new schema. The generalization process is a knowledge-based analysis of the causal structure of the narrative which removes unnecessary details while maintaining the validity of the causal explanation. The resulting generalized set of actions is then stored as a new schema and used by the system to correctly process narratives which were previously beyond its capabilities.
ML ID: 205

Generalizing Explanations of Narratives into Schemata
[Details] [PDF]
Raymond J. Mooney
Masters Thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, 1985.
This thesis describes a natural language system called GENESIS which improves its own performance through learning. The system processes short English narratives and is able to acquire, from a single narrative, a new schema for a stereotypical set of actions. During the understanding process, the system attempts to construct explanations for characters' actions in terms of the goals their actions were meant to achieve. When the system observes that a character in a narrative has achieved an interesting goal in a novel way, it generalizes the set of actions they used to achieve this goal into a new schema. The generalization process is a knowledge-based analysis of the causal structure of the narrative which removes unnecessary details while maintaining the validity of the causal explanation. The resulting generalized combination of actions is then stored as a new schema in the system's knowledge base. This new schema can then be used by the system to correctly process narratives which were previously beyond its capabilities.