Department of Computer Science

Machine Learning Research Group

University of Texas at Austin Artificial Intelligence Lab

Publications: Ensemble Learning

Ensemble Learning combines multiple learned models under the assumption that "two (or more) heads are better than one." The decisions of multiple hypotheses are combined in ensemble learning to produce more accurate results. Boosting and bagging are two popular approaches. Our work focuses on building diverse committees that are more effective than those built by existing methods, and, in particular, are useful for active learning.

For a general, popular book on the utility of combining diverse, independent opinions in human decision-making, see The Wisdom of Crowds.

  1. Explainable Improved Ensembling for Natural Language and Vision
    [Details] [PDF] [Slides (PPT)] [Slides (PDF)]
    Nazneen Rajani
    PhD Thesis, Department of Computer Science, The University of Texas at Austin, July 2018.
    Ensemble methods are well-known in machine learning for improving prediction accuracy. However, they do not adequately discriminate among underlying component models. The measure of how good a model is can sometimes be estimated from “why” it made a specific prediction. We propose a novel approach called Stacking With Auxiliary Features (SWAF) that effectively leverages component models by integrating such relevant information from context to improve ensembling. Using auxiliary features, our algorithm learns to rely on systems that not just agree on an output prediction but also the source or origin of that output. We demonstrate our approach to challenging structured prediction problems in Natural Language Processing and Vision including Information Extraction, Object Detection, and Visual Question Answering. We also present a variant of SWAF for combining systems that do not have training data in an unsupervised ensemble with systems that do have training data. Our combined approach obtains a new state-of-the-art, beating our prior performance on Information Extraction. The state-of-the-art systems on many AI applications are ensembles of deep-learning models. These models are hard to interpret and can sometimes make odd mistakes. Explanations make AI systems more transparent and also justify their predictions. We propose a scalable approach to generate visual explanations for ensemble methods using the localization maps of the component systems. Crowdsourced human evaluation on two new metrics indicates that our ensemble’s explanation significantly qualitatively outperforms individual systems’ explanations.
    ML ID: 364
  2. Stacking With Auxiliary Features for Visual Question Answering
    [Details] [PDF] [Poster]
    Nazneen Fatema Rajani, Raymond J. Mooney
    In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2217-2226, 2018.
    Visual Question Answering (VQA) is a well-known and challenging task that requires systems to jointly reason about natural language and vision. Deep learning models in various forms have been the standard for solving VQA. However, some of these VQA models are better at certain types of image-question pairs than other models. Ensembling VQA models intelligently to leverage their diverse expertise is, therefore, advantageous. Stacking With Auxiliary Features (SWAF) is an intelligent ensembling technique which learns to combine the results of multiple models using features of the current problem as context. We propose four categories of auxiliary features for ensembling for VQA. Three out of the four categories of features can be inferred from an image-question pair and do not require querying the component models. The fourth category of auxiliary features uses model-specific explanations. In this paper, we describe how we use these various categories of auxiliary features to improve performance for VQA. Using SWAF to effectively ensemble three recent systems, we obtain a new state-of-the-art. Our work also highlights the advantages of explainable AI models.
    ML ID: 360
  3. Ensembling Visual Explanations for VQA
    [Details] [PDF] [Poster]
    Nazneen Fatema Rajani, Raymond J. Mooney
    In Proceedings of the NIPS 2017 workshop on Visually-Grounded Interaction and Language (ViGIL), December 2017.
    Explanations make AI systems more transparent and also justify their predictions. The top-ranked Visual Question Answering (VQA) systems are ensembles of multiple systems; however, there has been no work on generating explanations for such ensembles. In this paper, we propose different methods for ensembling visual explanations for VQA using the localization maps of the component systems. Our crowd-sourced human evaluation indicates that our ensemble visual explanation is superior to each of the individual system’s visual explanation, although the results vary depending on the individual system that the ensemble is compared against as well as the number of individual systems that agree with the ensemble model’s answer. Overall, our ensemble explanation is better 63% of the time when compared to any individual system’s explanation. Our algorithm is also efficient and scales linearly in the number of component systems in the ensemble.
    ML ID: 359
  4. Using Explanations to Improve Ensembling of Visual Question Answering Systems
    [Details] [PDF] [Poster]
    Nazneen Fatema Rajani and Raymond J. Mooney
    In Proceedings of the IJCAI 2017 Workshop on Explainable Artificial Intelligence (XAI), 43-47, Melbourne, Australia, August 2017.
    We present results on using explanations as auxiliary features to improve stacked ensembles for Visual Question Answering (VQA). VQA is a challenging task that requires systems to jointly reason about natural language and vision. We present results applying a recent ensembling approach to VQA, Stacking with Auxiliary Features (SWAF), which learns to combine the results of multiple systems. We propose using features based on explanations to improve SWAF. Using explanations we are able to improve ensembling of three recent VQA systems.
    ML ID: 346
  5. Stacking With Auxiliary Features
    [Details] [PDF] [Slides (PDF)] [Poster]
    Nazneen Fatema Rajani and Raymond J. Mooney
    In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI-17), 2634-2640, Melbourne, Australia, 2017.
    Ensembling methods are well known for improving prediction accuracy. However, they are limited in the sense that they cannot effectively discriminate among component models. In this paper, we propose stacking with auxiliary features that learns to fuse additional relevant information from multiple component systems as well as input instances to improve performance. We use two types of auxiliary features -- instance features and provenance features. The instance features enable the stacker to discriminate across input instances and the provenance features enable the stacker to discriminate across component systems. When combined together, our algorithm learns to rely on systems that not just agree on an output but also the provenance of this output in conjunction with the properties of the input instance. We demonstrate the success of our approach on three very different and challenging natural language and vision problems: Slot Filling, Entity Discovery and Linking, and ImageNet Object Detection. We obtain new state-of-the-art results on the first two tasks and significant improvements on the ImageNet task, thus verifying the power and generality of our approach.
    ML ID: 343
  6. Stacking With Auxiliary Features for Combining Supervised and Unsupervised Ensembles
    [Details] [PDF]
    Nazneen Fatema Rajani and Raymond J. Mooney
    In Proceedings of the Ninth Text Analysis Conference (TAC 2016), 2016.
    We propose stacking with auxiliary features(SWAF) that combines supervised and unsupervised methods to ensemble multiple sys-tems for the Tri-lingual Entity Discovery andLinking (TEDL) 2016 evaluation. We use theTEDL 2015 systems for training and EDL12016 systems for evaluating our algorithm.We perform a post-processing step on the out-puts obtained from the classifier so as to ag-gregate into one final system.
    ML ID: 356
  7. Stacking With Auxiliary Features
    [Details] [PDF]
    Nazneen Fatema Rajani and Raymond J. Mooney
    ArXiv preprint arXiv:1605.08764, 2016.
    Ensembling methods are well known for improving prediction accuracy. However, they are limited in the sense that they cannot discriminate among component models effectively. In this paper, we propose stacking with auxiliary features that learns to fuse relevant information from multiple systems to improve performance. Auxiliary features enable the stacker to rely on systems that not just agree on an output but also the provenance of the output. We demonstrate our approach on three very different and difficult problems -- the Cold Start Slot Filling, the Tri-lingual Entity Discovery and Linking and the ImageNet object detection tasks. We obtain new state-of-the-art results on the first two tasks and substantial improvements on the detection task, thus verifying the power and generality of our approach.
    ML ID: 333
  8. Combining Supervised and Unsupervised Ensembles for Knowledge Base Population
    [Details] [PDF]
    Nazneen Fatema Rajani and Raymond J. Mooney
    To Appear In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP-16), 2016.
    We propose an algorithm that combines supervised and unsupervised methods to ensemble multiple systems for two popular Knowledge Base Population (KBP) tasks, Cold Start Slot Filling (CSSF) and Tri-lingual Entity Discovery and Linking (TEDL). We demonstrate that it outperforms the best system for both tasks in the 2015 competition, several ensembling baselines, as well as a state-of-the-art stacking approach. The success of our technique on two different and challenging problems demonstrates the power and generality of our combined approach to ensembling.
    ML ID: 331
  9. Stacked Ensembles of Information Extractors for Knowledge-Base Population by Combining Supervised and Unsupervised Approaches
    [Details] [PDF] [Slides (PDF)]
    Nazneen Fatema Rajani and Raymond J Mooney
    In Proceedings of the Eighth Text Analysis Conference (TAC 2015), November 2015.
    The UTAustin team participated in two main tasks this year - the Cold Start Slot Filling (CSSF) task and the Slot-Filler Validation/Ensembling task, which was divided into the filtering and ensembling subtasks. Our system uses stacking to ensemble multiple systems for the KBP slot filling task, as described in our ACL 2015 paper. We expand the stacking approach by allowing the classifier to also utilize additions features that are relevant to making a final decision. Stacking relies on supervised training and hence requires common systems from the 2014 data to be used as training. However, that approach has limitations on performance and therefore we propose a novel approach of combining the supervised approach with an unsupervised approach on the remaining systems. We believe this combination approach gives our best run for the ensembling task. In this paper, we also discuss strategies to handle Cold Start data which comes from multiple hops.
    ML ID: 355
  10. Knowledge Transfer Using Latent Variable Models
    [Details] [PDF] [Slides (PDF)]
    Ayan Acharya
    PhD Thesis, Department of Electrical and Computer Engineering, The University of Texas at Austin, August 2015.
    In several applications, scarcity of labeled data is a challenging problem that hinders the predictive capabilities of machine learning algorithms. Additionally, the distribution of the data changes over time, rendering models trained with older data less capable of discovering useful structure from the newly available data. Transfer learning is a convenient framework to overcome such problems where the learning of a model specific to a domain can benefit the learning of other models in other domains through either simultaneous training of domains or sequential transfer of knowledge from one domain to the others. This thesis explores the opportunities of knowledge transfer in the context of a few applications pertaining to object recognition from images, text analysis, network modeling and recommender systems, using probabilistic latent variable models as building blocks. Both simultaneous and sequential knowledge transfer are achieved through the latent variables, either by sharing these across multiple related domains (for simultaneous learning) or by adapting their distributions to fit data from a new domain (for sequential learning).
    ML ID: 322
  11. Stacked Ensembles of Information Extractors for Knowledge-Base Population
    [Details] [PDF] [Slides (PPT)]
    Vidhoon Viswanathan and Nazneen Fatema Rajani and Yinon Bentor and Raymond J. Mooney
    In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL-15), 177-187, Beijing, China, July 2015.
    We present results on using stacking to ensemble multiple systems for the Knowledge Base Population English Slot Filling (KBP-ESF) task. In addition to using the output and confidence of each system as input to the stacked classifier, we also use features capturing how well the systems agree about the provenance of the information they extract. We demonstrate that our stacking approach outperforms the best system from the 2014 KBP-ESF competition as well as alternative ensembling methods employed in the 2014 KBP Slot Filler Validation task and several other ensembling baselines. Additionally, we demonstrate that including provenance information further increases the performance of stacking.
    ML ID: 318
  12. Knowledge Base Population using Stacked Ensembles of Information Extractors
    [Details] [PDF]
    Vidhoon Viswanathan
    Masters Thesis, Department of Computer Science, The University of Texas at Austin, May 2015.
    The performance of relation extractors plays a significant role in automatic creation of knowledge bases from web corpus. Using automated systems to create knowledge bases from web is known as Knowledge Base Population. Text Analysis Conference conducts English Slot Filling (ESF) and Slot Filler Validation (SFV) tasks as part of its KBP track to promote research in this area. Slot Filling systems are developed to do relation extraction for specific relation and entity types. Several participating universities have built Slot Filling systems addressing different aspects employing different algorithms and techniques for these tasks.

    In this thesis, we investigate the use of ensemble learning to combine the output of existing individual Slot Filling systems. We are the first to employ Stacking, a type of ensemble learning algorithm for the task of ensembling Slot Filling systems for the KBP ESF and SFV tasks. Our approach builds an ensemble classi- fier that learns to meaningfully combine output from different Slot Filling systems and predict the correctness of extractions. Our experimental evaluation proves that Stacking is useful for ensembling SF systems. We demonstrate new state-of-the-art results for KBP ESF task. Our proposed system achieves an F1 score of 47.

    Given the complexity of developing Slot Filling systems from scratch, our promising results indicate that performance on Slot Filling tasks can be increased by ensembling existing systems in shorter timeframe. Our work promotes research and investigation into other methods for ensembling Slot Filling systems.

    ML ID: 315
  13. Creating Diverse Ensemble Classifiers to Reduce Supervision
    [Details] [PDF]
    Prem Melville
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, November 2005. 141 pages. Technical Report TR-05-49.
    Ensemble methods like Bagging and Boosting which combine the decisions of multiple hypotheses are some of the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its generalization error. In this thesis, we present a new method for generating ensembles, DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples), that directly constructs diverse hypotheses using additional artificially-generated training examples. The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. The diverse ensembles produced by DECORATE are very effective for reducing the amount of supervision required for building accurate models. The first task we demonstrate this on is classification given a fixed training set. Experimental results using decision-tree induction as a base learner demonstrate that our approach consistently achieves higher predictive accuracy than the base classifier, Bagging and Random Forests. Also, DECORATE attains higher accuracy than Boosting on small training sets, and achieves comparable performance on larger training sets. Additional experiments demonstrate DECORATE's resilience to imperfections in data, in the form of missing features, classification noise, and feature noise.
    DECORATE ensembles can also be used to reduce supervision through active learning, in which the learner selects the most informative examples from a pool of unlabeled examples, such that acquiring their labels will increase the accuracy of the classifier. Query by Committee is one effective approach to active learning in which disagreement within the ensemble of hypotheses is used to select examples for labeling. Query by Bagging and Query by Boosting are two practical implementations of this approach that use Bagging and Boosting respectively, to build the committees. For efficient active learning it is critical that the committee be made up of consistent hypotheses that are very different from each other. Since DECORATE explicitly builds such committees, it is well-suited for this task. We introduce a new algorithm, Active-DECORATE, which uses DECORATE committees to select good training examples. Experimental results demonstrate that Active-DECORATE typically requires labeling fewer examples to achieve the same accuracy as Query by Bagging and Query by Boosting. Apart from optimizing classification accuracy, in many applications, producing good class probability estimates is also important, e.g., in fraud detection, which has unequal misclassification costs. This thesis introduces a novel approach to active learning based on Active-DECORATE which uses Jensen-Shannon divergence (a similarity measure for probability distributions) to improve the selection of training examples for optimizing probability estimation. Comprehensive experimental results demonstrate the benefits of our approach.
    Unlike the active learning setting, in many learning problems the class labels for all instances are known, but feature values may be missing and can be acquired at a cost. For building accurate predictive models, acquiring complete information for all instances is often quite expensive, while acquiring information for a random subset of instances may not be optimal. We formalize the task of active feature-value acquisition, which tries to reduce the cost of achieving a desired model accuracy by identifying instances for which obtaining complete information is most informative. We present an approach, based on DECORATE, in which instances are selected for acquisition based on the current model's accuracy and its confidence in the prediction. Experimental results demonstrate that our approach can induce accurate models using substantially fewer feature-value acquisitions than random sampling.
    ML ID: 182
  14. Combining Bias and Variance Reduction Techniques for Regression
    [Details] [PDF]
    Yuk Lai Suen, Prem Melville and Raymond J. Mooney
    Technical Report UT-AI-TR-05-321, University of Texas at Austin, July 2005. www.cs.utexas.edu/~ml/publication.
    Gradient Boosting and bagging applied to regressors can reduce the error due to bias and variance respectively. Alternatively, Stochastic Gradient Boosting (SGB) and Iterated Bagging (IB) attempt to simultaneously reduce the contribution of both bias and variance to error. We provide an extensive empirical analysis of these methods, along with two alternate bias-variance reduction approaches --- bagging Gradient Boosting (BagGB) and bagging Stochastic Gradient Boosting (BagSGB). Experimental results demonstrate that SGB does not perform as well as IB or the alternate approaches. Furthermore, results show that, while BagGB and BagSGB perform competitively for low-bias learners, in general, Iterated Bagging is the most effective of these methods.
    ML ID: 178
  15. Combining Bias and Variance Reduction Techniques for Regression
    [Details] [PDF]
    Y. L. Suen, P. Melville and Raymond J. Mooney
    In Proceedings of the 16th European Conference on Machine Learning, 741-749, Porto, Portugal, October 2005.
    Gradient Boosting and bagging applied to regressors can reduce the error due to bias and variance respectively. Alternatively, Stochastic Gradient Boosting (SGB) and Iterated Bagging (IB) attempt to simultaneously reduce the contribution of both bias and variance to error. We provide an extensive empirical analysis of these methods, along with two alternate bias-variance reduction approaches --- bagging Gradient Boosting (BagGB) and bagging Stochastic Gradient Boosting (BagSGB). Experimental results demonstrate that SGB does not perform as well as IB or the alternate approaches. Furthermore, results show that, while BagGB and BagSGB perform competitively for low-bias learners, in general, Iterated Bagging is the most effective of these methods.
    ML ID: 173
  16. Diverse Ensembles for Active Learning
    [Details] [PDF]
    Prem Melville and Raymond J. Mooney
    In Proceedings of 21st International Conference on Machine Learning (ICML-2004), 584-591, Banff, Canada, July 2004.
    Query by Committee is an effective approach to selective sampling in which disagreement amongst an ensemble of hypotheses is used to select data for labeling. Query by Bagging and Query by Boosting are two practical implementations of this approach that use Bagging and Boosting, respectively, to build the committees. For effective active learning, it is critical that the committee be made up of consistent hypotheses that are very different from each other. DECORATE is a recently developed method that directly constructs such diverse committees using artificial training data. This paper introduces Active-Decorate, which uses Decorate committees to select good training examples. Extensive experimental results demonstrate that, in general, Active-DECORATE outperforms both Query by Bagging and Query by Boosting.
    ML ID: 146
  17. Experiments on Ensembles with Missing and Noisy Data
    [Details] [PDF]
    Prem Melville, Nishit Shah, Lilyana Mihalkova, and Raymond J. Mooney
    In F. Roli, J. Kittler, and T. Windeatt, editors, {Lecture Notes in Computer Science:} Proceedings of the Fifth International Workshop on Multi Classifier Systems (MCS-2004), 293-302, Cagliari, Italy, June 2004. Springer Verlag.
    One of the potential advantages of multiple classifier systems is an increased robustness to noise and other imperfections in data. Previous experiments on classification noise have shown that bagging is fairly robust but that boosting is quite sensitive. DECORATE is a recently introduced ensemble method that constructs diverse committees using artificial data. It has been shown to generally outperform both boosting and bagging when training data is limited. This paper compares the sensitivity of bagging, boosting, and DECORATE to three types of imperfect data: missing features, classification noise, and feature noise. For missing data, DECORATE is the most robust. For classification noise, bagging and DECORATE are both robust, with bagging being slightly better than DECORATE, while boosting is quite sensitive. For feature noise, all of the ensemble methods increase the resilience of the base classifier.
    ML ID: 143
  18. Creating Diversity in Ensembles Using Artificial Data
    [Details] [PDF]
    Prem Melville and Raymond J. Mooney
    Journal of Information Fusion: Special Issue on Diversity in Multi Classifier Systems, 6(1):99-111, 2004.
    The diversity of an ensemble of classifiers is known to be an important factor in determining its generalization error. We present a new method for generating ensembles, DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples), that directly constructs diverse hypotheses using additional artificially-constructed training examples. The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. Experimental results using decision-tree induction as a base learner demonstrate that this approach consistently achieves higher predictive accuracy than the base classifier, Bagging and Random Forests. DECORATE also obtains higher accuracy than Boosting on small training sets, and achieves comparable performance on larger training sets.
    ML ID: 139
  19. Creating Diverse Ensemble Classifiers
    [Details] [PDF]
    Prem Melville
    Technical Report UT-AI-TR-03-306, Department of Computer Sciences, University of Texas at Austin, December 2003. Ph.D. proposal.
    Ensemble methods like Bagging and Boosting which combine the decisions of multiple hypotheses are some of the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its generalization error. We present a new method for generating ensembles, DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples), that directly constructs diverse hypotheses using additional artificially-constructed training examples. The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. Experimental results using decision-tree induction as a base learner demonstrate that this approach consistently achieves higher predictive accuracy than both the base classifier and Bagging. DECORATE also obtains higher accuracy than Boosting early in the learning curve when training data is limited.
    We propose to show that DECORATE can also be effectively used for (1) active learning, to reduce the number of training examples required to achieve high accuracy; (2) exploiting unlabeled data to improve accuracy in a semi-supervised learning setting; (3) combining active learning with semi-supervision for improved results; (4) obtaining better class membership probability estimates; (5) reducing the error of regressors; and (6) improving the accuracy of relational learners.
    ML ID: 132
  20. Constructing Diverse Classifier Ensembles Using Artificial Training Examples
    [Details] [PDF]
    Prem Melville and Raymond J. Mooney
    In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-2003), 505-510, Acapulco, Mexico, August 2003.
    Ensemble methods like bagging and boosting that combine the decisions of multiple hypotheses are some of the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its generalization error. This paper presents a new method for generating ensembles that directly constructs diverse hypotheses using additional artificially-constructed training examples. The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. Experimental results using decision-tree induction as a base learner demonstrate that this approach consistently achieves higher predictive accuracy than both the base classifier and bagging (whereas boosting can occasionally decrease accuracy), and also obtains higher accuracy than boosting early in the learning curve when training data is limited.
    ML ID: 122