Machine Learning Research Group | University of Texas

Publications: Explainable AI

AI systems’ ability to explain their reasoning is critical to their utility since human users do not trust decisions from opaque "black boxes." Explainable AI studies the development of systems that provide visual, textual, or multi-modal explanations that help elucidate the reasoning behind their decisions.

Hide abstracts

Towards Automated Error Analysis: Learning to Characterize Errors
[Details] [PDF] [Poster]
Tong Gao, Shivang Singh, Raymond J. Mooney
Short version appears in the 19th International Florida Artificial Intelligence Research Society Conference (FLAIRS), May 2022.
Characterizing the patterns of errors that a system makes helps researchers focus future development on increasing its accuracy and robustness. We propose a novel form of ”meta learning” that automatically learns interpretable rules that characterize the types of errors that a system makes, and demonstrate these rules’ ability to help understand and improve two NLP systems. Our approach works by collecting error cases on validation data, extracting meta-features describing these samples, and finally learning rules that characterize errors using these features. We apply our approach to VilBERT, for Visual Question Answering, and RoBERTa, for Common Sense Question Answering. Our system learns interpretable rules that provide insights into systemic errors these systems make on the given tasks. Using these insights, we are also able to “close the loop” and modestly improve performance of these systems.
ML ID: 400
Incorporating Textual Resources to Improve Visual Question Answering
[Details] [PDF] [Slides (PDF)]
Jialin Wu
September 2021. Ph.D. Proposal.
Recently, visual question answering (VQA) emerged as a challenge multi-modal task and gained in popularity. The goal is to answer questions that query information associated with the visual content in the given image. Since the required information could be from both inside and outside the image, common types of visual features, such as object and attribute detection, fail to provide enough materials for answering the questions. Textual resources, such as captions, explanations, encyclopedia articles, can help VQA systems comprehensively understand the image, reason following the right path, and access external facts. Specifically, they provide concise descriptions of the image, precise reasons for the correct answer, and factual knowledge beyond the image. We presented completed work on generating image captions that are targeted to help answer a specific visual question. We introduced an approach that generates textual explanations and used these explanations to determine which answer is mostly supported. We used explanations to recognize the critical objects for solving the visual question and trained the VQA systems to be influenced by these objects most. We also explored using textual resources to provide external knowledge beyond the visual content that is indispensable for a recent trend towards knowledge-based VQA. We further propose to break down visual questions such that each segment, which carries a single piece of semantic content in the question, can be associated with its specific knowledge. This separation aims to help the VQA system understand the question structure to satisfy the need for linking different aspects of the question to different types of information within and beyond the image.
ML ID: 397
Improving VQA and its Explanations by Comparing Competing Explanations
[Details] [PDF] [Slides (PDF)]
Jialin Wu, Liyan Chen, Raymond J. Mooney
In The AAAI Conference on Artificial Intelligence (AAAI), Explainable Agency in Artificial Intelligence Workshop, February 2021.
Most recent state-of-the-art Visual Question Answering (VQA) systems are opaque black boxes that are only trained to fit the answer distribution given the question and visual content. As a result, these systems frequently take shortcuts, focusing on simple visual concepts or question priors. This phenomenon becomes more problematic as the questions become complex that requires more reasoning and commonsense knowledge. To address this issue, we present a novel framework that uses explanations for competing answers to help VQA systems select the correct answer. By training on human textual explanations, our framework builds better representations for the questions and visual content, and then reweights confidences in the answer candidates using either generated or retrieved explanations from the training set. We evaluate our framework on the VQA-X dataset, which has more difficult questions with human explanations, achieving new state-of-the-art results on both VQA and its explanations.
ML ID: 387
Self-Critical Reasoning for Robust Visual Question Answering
[Details] [PDF] [Slides (PDF)] [Poster]
Jialin Wu and Raymond J. Mooney
In Proceedings of Neural Information Processing Systems (NeurIPS) , December 2019.
Visual Question Answering (VQA) deep-learning systems tend to capture superficial statistical correlations in the training data because of strong language priors and fail to generalize to test data with a significantly different question-answer (QA) distribution [1]. To address this issue, we introduce a self-critical training objective that ensures that visual explanations of correct answers match the most influential image regions more than other competitive answer candidates. The influential regions are either determined from human visual/textual explanations or automatically from just significant words in the question and answer. We evaluate our approach on the VQA generalization task using the VQA-CP dataset, achieving a new state-of-the-art i.e., 49.5 % using textual explanations and 48.5 % using automatically annotated regions.
ML ID: 380
Faithful Multimodal Explanation for Visual Question Answering
[Details] [PDF] [Slides (PPT)]
Jialin Wu and Raymond J. Mooney
In Proceedings of the Second BlackboxNLP Workshop at ACL, 103-112, Florence, Italy, August 2019.
AI systems’ ability to explain their reasoning is critical to their utility and trustworthiness. Deep neural networks have enabled significant progress on many challenging problems such as visual question answering (VQA). However, most of them are opaque black boxes with limited explanatory capability. This paper presents a novel approach to developing a high-performing VQA system that can elucidate its answers with integrated textual and visual explanations that faithfully reflect important aspects of its underlying reasoning process while capturing the style of comprehensible human explanations. Extensive experimental evaluation demonstrates the advantages of this approach compared to competing methods using both automated metrics and human evaluation.
ML ID: 374
Do Human Rationales Improve Machine Explanations?
[Details] [PDF] [Poster]
Julia Strout, Ye Zhang, Raymond J. Mooney
In Proceedings of the Second BlackboxNLP Workshop at ACL, 56-62, Florence, Italy, August 2019.
Work on “learning with rationales” shows that humans providing explanations to a machine learning system can improve the system’s predictive accuracy. However, this work has not been connected to work in “explainable AI” which concerns machines explaining their reasoning to humans. In this work, we show that learning with rationales can also improve the quality of the machine’s explanations as evaluated by human judges. Specifically, we present experiments showing that, for CNN-based text classification, explanations generated using “supervised attention” are judged superior to explanations generated using normal unsupervised attention.
ML ID: 373
Explainable Improved Ensembling for Natural Language and Vision
[Details] [PDF] [Slides (PPT)] [Slides (PDF)]
Nazneen Rajani
PhD Thesis, Department of Computer Science, The University of Texas at Austin, July 2018.
Ensemble methods are well-known in machine learning for improving prediction accuracy. However, they do not adequately discriminate among underlying component models. The measure of how good a model is can sometimes be estimated from “why” it made a specific prediction. We propose a novel approach called Stacking With Auxiliary Features (SWAF) that effectively leverages component models by integrating such relevant information from context to improve ensembling. Using auxiliary features, our algorithm learns to rely on systems that not just agree on an output prediction but also the source or origin of that output. We demonstrate our approach to challenging structured prediction problems in Natural Language Processing and Vision including Information Extraction, Object Detection, and Visual Question Answering. We also present a variant of SWAF for combining systems that do not have training data in an unsupervised ensemble with systems that do have training data. Our combined approach obtains a new state-of-the-art, beating our prior performance on Information Extraction. The state-of-the-art systems on many AI applications are ensembles of deep-learning models. These models are hard to interpret and can sometimes make odd mistakes. Explanations make AI systems more transparent and also justify their predictions. We propose a scalable approach to generate visual explanations for ensemble methods using the localization maps of the component systems. Crowdsourced human evaluation on two new metrics indicates that our ensemble’s explanation significantly qualitatively outperforms individual systems’ explanations.
ML ID: 364
Ensembling Visual Explanations for VQA
[Details] [PDF] [Poster]
Nazneen Fatema Rajani, Raymond J. Mooney
In Proceedings of the NIPS 2017 workshop on Visually-Grounded Interaction and Language (ViGIL), December 2017.
Explanations make AI systems more transparent and also justify their predictions. The top-ranked Visual Question Answering (VQA) systems are ensembles of multiple systems; however, there has been no work on generating explanations for such ensembles. In this paper, we propose different methods for ensembling visual explanations for VQA using the localization maps of the component systems. Our crowd-sourced human evaluation indicates that our ensemble visual explanation is superior to each of the individual system’s visual explanation, although the results vary depending on the individual system that the ensemble is compared against as well as the number of individual systems that agree with the ensemble model’s answer. Overall, our ensemble explanation is better 63% of the time when compared to any individual system’s explanation. Our algorithm is also efficient and scales linearly in the number of component systems in the ensemble.
ML ID: 359
Using Explanations to Improve Ensembling of Visual Question Answering Systems
[Details] [PDF] [Poster]
Nazneen Fatema Rajani and Raymond J. Mooney
In Proceedings of the IJCAI 2017 Workshop on Explainable Artificial Intelligence (XAI), 43-47, Melbourne, Australia, August 2017.
We present results on using explanations as auxiliary features to improve stacked ensembles for Visual Question Answering (VQA). VQA is a challenging task that requires systems to jointly reason about natural language and vision. We present results applying a recent ensembling approach to VQA, Stacking with Auxiliary Features (SWAF), which learns to combine the results of multiple systems. We propose using features based on explanations to improve SWAF. Using explanations we are able to improve ensembling of three recent VQA systems.
ML ID: 346
Explaining Recommendations: Satisfaction vs. Promotion
[Details] [PDF]
Mustafa Bilgic and Raymond J. Mooney
In Proceedings of Beyond Personalization 2005: A Workshop on the Next Stage of Recommender Systems Research at the 2005 International Conference on Intelligent User Interfaces, San Diego, CA, January 2005.
Recommender systems have become a popular technique for helping users select desirable books, movies, music and other items. Most research in the area has focused on developing and evaluating algorithms for efficiently producing accurate recommendations. However, the ability to effectively explain its recommendations to users is another important aspect of a recommender system. The only previous investigation of methods for explaining recommendations showed that certain styles of explanations were effective at convincing users to adopt recommendations (i.e. promotion) but failed to show that explanations actually helped users make more accurate decisions (i.e. satisfaction). We present two new methods for explaining recommendations of content-based and/or collaborative systems and experimentally show that they actually improve user's estimation of item quality.
ML ID: 156
Explanation for Recommender Systems: Satisfaction vs. Promotion
[Details] [PDF]
Mustafa Bilgic
Austin, TX, May 2004. Undergraduate Honor Thesis, Department of Computer Sciences, University of Texas at Austin.
There is much work done on Recommender Systems, systems that automate the recommendation process; however there is little work done on explaining recommendations. The only study we know did an experiment measuring which explanation system increased user's acceptance of the item how much (promotion). We took a different approach and measured which explanation system estimated the true quality of the item the best so that the user can be satisfied with the selection in the end (satisfaction).
ML ID: 142