Department of Computer Science

Machine Learning Research Group

University of Texas at Austin Artificial Intelligence Lab

Publications: Active Learning

Active learning differs from passive "learning from examples" in that the learning algorithm itself attempts to select the most informative data for training. Since supervised labeling of data is expensive, active learning attempts to reduce the human effort needed to learn an accurate result by selecting only the most informative examples for labeling. Our work has focused on diverse ensembles for active learning and applications of active learning to problems in natural-language processing and semi-supervised learning. We have also addressed the problem of actively acquiring the most useful features values of examples as well as supervised class labels.
  1. Improving Grounded Natural Language Understanding through Human-Robot Dialog
    [Details] [PDF] [Poster]
    Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Nick Walker, Yuqian Jiang, Harel Yedidsion, Justin Hart, Peter Stone, and Raymond J. Mooney
    In IEEE International Conference on Robotics and Automation (ICRA), Montreal, Canada, May 2019.
    Natural language understanding for robotics can require substantial domain- and platform-specific engineering. For example, for mobile robots to pick-and-place objects in an environment to satisfy human commands, we can specify the language humans use to issue such commands, and connect concept words like red can to physical object properties. One way to alleviate this engineering for a new domain is to enable robots in human environments to adapt dynamically -- continually learning new language constructions and perceptual concepts. In this work, we present an end-to-end pipeline for translating natural language commands to discrete robot actions, and use clarification dialogs to jointly improve language parsing and concept grounding. We train and evaluate this agent in a virtual setting on Amazon Mechanical Turk, and we transfer the learned agent to a physical robot platform to demonstrate it in the real world.
    ML ID: 371
  2. Learning a Policy for Opportunistic Active Learning
    [Details] [PDF] [Slides (PPT)] [Slides (PDF)]
    Aishwarya Padmakumar, Peter Stone, Raymond J. Mooney
    In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP-18), Brussels, Belgium, November 2018.
    Active learning identifies data points to label that are expected to be the most useful in improving a supervised model. Opportunistic active learning incorporates active learning into interactive tasks that constrain possible queries during interactions. Prior work has shown that opportunistic active learning can be used to improve grounding of natural language descriptions in an interactive object retrieval task. In this work, we use reinforcement learning for such an object retrieval task, to learn a policy that effectively trades off task completion with model improvement that would benefit future tasks.
    ML ID: 368
  3. Improved Models and Queries for Grounded Human-Robot Dialog
    [Details] [PDF] [Slides (PPT)] [Slides (PDF)]
    Aishwarya Padmakumar
    October 2018. PhD Proposal, Department of Computer Science, The University of Texas At Austin.
    The ability to understand and communicate in natural language can make robots much more accessible for naive users. Environments such as homes and offices contain many objects that humans describe in diverse language referencing perceptual properties. Robots operating in such environments need to be able to understand such descriptions. Different types of dialog interactions with humans can help robots clarify their understanding to reduce mistakes, and also improve their language understanding models, or adapt them to the specific domain of operation. We present completed work on jointly learning a dialog policy that enables a robot to clarify partially understood natural language commands, while simultaneously using the dialogs to improve the underlying semantic parser for future commands. We introduce the setting of opportunistic active learning - a framework for interactive tasks that use supervised models. This framework allows a robot to ask diverse, potentially off-topic queries across interactions, requiring the robot to trade-off between task completion and knowledge acquisition for future tasks. We also attempt to learn a dialog policy in this framework using reinforcement learning. We propose a novel distributional model for perceptual grounding, based on learning a joint space for vector representations from multiple modalities. We also propose a method for identifying more informative clarification questions that can scale well to a larger space of objects, and wish to learn a dialog policy that would make use of such clarifications.
    ML ID: 367
  4. Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog
    [Details] [PDF]
    Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Nick Walker, Yuqian Jiang, Harel Yedidsion, Justin Hart, Peter Stone, and Raymond J. Mooney
    In Late-breaking Track at the SIGDIAL Special Session on Physically Situated Dialogue (RoboDIAL-18), Melbourne, Australia, July 2018.
    In this work, we present methods for parsing natural language to underlying meanings, and using robotic sensors to create multi-modal models of perceptual concepts. We combine these steps towards language understanding into a holistic agent for jointly improving parsing and perception on a robotic platform through human-robot dialog. We train and evaluate this agent on Amazon Mechanical Turk, then demonstrate it on a robotic platform initialized from that conversational data. Our experiments show that improving both parsing and perception components from conversations improves communication quality and human ratings of the agent.
    ML ID: 365
  5. Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog
    [Details] [PDF] [Poster]
    Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Nick Walker, Yuqian Jiang, Harel Yedidsion, Justin Hart, Peter Stone, and Raymond J. Mooney
    In RSS Workshop on Models and Representations for Natural Human-Robot Communication (MRHRC-18). Robotics: Science and Systems (RSS), June 2018.
    Natural language understanding in robots needs to be robust to a wide-range of both human speakers and human environments. Rather than force humans to use language that robots can understand, robots in human environments should dynamically adapt—continuously learning new language constructions and perceptual concepts as they are used in context. In this work, we present methods for parsing natural language to underlying meanings, and using robotic sensors to create multi-modal models of perceptual concepts. We combine these steps towards language understanding into a holistic agent for jointly improving parsing and perception on a robotic platform through human-robot dialog. We train and evaluate this agent on Amazon Mechanical Turk, then demonstrate it on a robotic platform initialized from conversational data gathered from Mechanical Turk. Our experiments show that improving both parsing and perception components from conversations improves communication quality and human ratings of the agent.
    ML ID: 363
  6. Opportunistic Active Learning for Grounding Natural Language Descriptions
    [Details] [PDF] [Slides (PPT)] [Slides (PDF)] [Poster]
    Jesse Thomason and Aishwarya Padmakumar and Jivko Sinapov and Justin Hart and Peter Stone and Raymond J. Mooney
    In Sergey Levine and Vincent Vanhoucke and Ken Goldberg, editors, Proceedings of the 1st Annual Conference on Robot Learning (CoRL-17), 67--76, Mountain View, California, November 2017. PMLR.
    Active learning identifies data points from a pool of unlabeled examples whose labels, if made available, are most likely to improve the predictions of a supervised model. Most research on active learning assumes that an agent has access to the entire pool of unlabeled data and can ask for labels of any data points during an initial training phase. However, when incorporated in a larger task, an agent may only be able to query some subset of the unlabeled pool. An agent can also opportunistically query for labels that may be useful in the future, even if they are not immediately relevant. In this paper, we demonstrate that this type of opportunistic active learning can improve performance in grounding natural language descriptions of everyday objects---an important skill for home and office robots. We find, with a real robot in an object identification setting, that inquisitive behavior---asking users important questions about the meanings of words that may be off-topic for the current dialog---leads to identifying the correct object more often over time.
    ML ID: 350
  7. Knowledge Transfer Using Latent Variable Models
    [Details] [PDF] [Slides (PDF)]
    Ayan Acharya
    PhD Thesis, Department of Electrical and Computer Engineering, The University of Texas at Austin, August 2015.
    In several applications, scarcity of labeled data is a challenging problem that hinders the predictive capabilities of machine learning algorithms. Additionally, the distribution of the data changes over time, rendering models trained with older data less capable of discovering useful structure from the newly available data. Transfer learning is a convenient framework to overcome such problems where the learning of a model specific to a domain can benefit the learning of other models in other domains through either simultaneous training of domains or sequential transfer of knowledge from one domain to the others. This thesis explores the opportunities of knowledge transfer in the context of a few applications pertaining to object recognition from images, text analysis, network modeling and recommender systems, using probabilistic latent variable models as building blocks. Both simultaneous and sequential knowledge transfer are achieved through the latent variables, either by sharing these across multiple related domains (for simultaneous learning) or by adapting their distributions to fit data from a new domain (for sequential learning).
    ML ID: 322
  8. Active Multitask Learning Using Both Latent and Supervised Shared Topics
    [Details] [PDF] [Slides (PDF)]
    Ayan Acharya and Raymond J. Mooney and Joydeep Ghosh
    In Proceedings of the 2014 SIAM International Conference on Data Mining (SDM14), Philadelphia, Pennsylvania, April 2014.
    Multitask learning (MTL) via a shared representation has been adopted to alleviate problems with sparsity of labeled data across different learning tasks. Active learning, on the other hand, reduces the cost of labeling examples by making informative queries over an unlabeled pool of data. Therefore, a unification of both of these approaches can potentially be useful in settings where labeled information is expensive to obtain but the learning tasks or domains have some common characteristics. This paper introduces two such models -- Active Doubly Supervised Latent Dirichlet Allocation (Act-DSLDA) and its non-parametric variation (Act-NPDSLDA) that integrate MTL and active learning in the same framework. These models make use of both latent and supervised shared topics to accomplish multitask learning. Experimental results on both document and image classification show that integrating MTL and active learning along with shared latent and supervised topics is superior to other methods which do not employ all of these components.
    ML ID: 297
  9. Using Active Relocation to Aid Reinforcement Learning
    [Details] [PDF]
    Lilyana Mihalkova and Raymond Mooney
    In Prodeedings of the 19th International FLAIRS Conference (FLAIRS-2006), 580-585, Melbourne Beach, FL, May 2006.
    We propose a new framework for aiding a reinforcement learner by allowing it to relocate, or move, to a state it selects so as to decrease the number of steps it needs to take in order to develop an effective policy. The framework requires a minimal amount of human involvement or expertise and assumes a cost for each relocation. Several methods for taking advantage of the ability to relocate are proposed, and their effectiveness is tested in two commonly-used domains.
    ML ID: 166
  10. Creating Diverse Ensemble Classifiers to Reduce Supervision
    [Details] [PDF]
    Prem Melville
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, November 2005. 141 pages. Technical Report TR-05-49.
    Ensemble methods like Bagging and Boosting which combine the decisions of multiple hypotheses are some of the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its generalization error. In this thesis, we present a new method for generating ensembles, DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples), that directly constructs diverse hypotheses using additional artificially-generated training examples. The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. The diverse ensembles produced by DECORATE are very effective for reducing the amount of supervision required for building accurate models. The first task we demonstrate this on is classification given a fixed training set. Experimental results using decision-tree induction as a base learner demonstrate that our approach consistently achieves higher predictive accuracy than the base classifier, Bagging and Random Forests. Also, DECORATE attains higher accuracy than Boosting on small training sets, and achieves comparable performance on larger training sets. Additional experiments demonstrate DECORATE's resilience to imperfections in data, in the form of missing features, classification noise, and feature noise.
    DECORATE ensembles can also be used to reduce supervision through active learning, in which the learner selects the most informative examples from a pool of unlabeled examples, such that acquiring their labels will increase the accuracy of the classifier. Query by Committee is one effective approach to active learning in which disagreement within the ensemble of hypotheses is used to select examples for labeling. Query by Bagging and Query by Boosting are two practical implementations of this approach that use Bagging and Boosting respectively, to build the committees. For efficient active learning it is critical that the committee be made up of consistent hypotheses that are very different from each other. Since DECORATE explicitly builds such committees, it is well-suited for this task. We introduce a new algorithm, Active-DECORATE, which uses DECORATE committees to select good training examples. Experimental results demonstrate that Active-DECORATE typically requires labeling fewer examples to achieve the same accuracy as Query by Bagging and Query by Boosting. Apart from optimizing classification accuracy, in many applications, producing good class probability estimates is also important, e.g., in fraud detection, which has unequal misclassification costs. This thesis introduces a novel approach to active learning based on Active-DECORATE which uses Jensen-Shannon divergence (a similarity measure for probability distributions) to improve the selection of training examples for optimizing probability estimation. Comprehensive experimental results demonstrate the benefits of our approach.
    Unlike the active learning setting, in many learning problems the class labels for all instances are known, but feature values may be missing and can be acquired at a cost. For building accurate predictive models, acquiring complete information for all instances is often quite expensive, while acquiring information for a random subset of instances may not be optimal. We formalize the task of active feature-value acquisition, which tries to reduce the cost of achieving a desired model accuracy by identifying instances for which obtaining complete information is most informative. We present an approach, based on DECORATE, in which instances are selected for acquisition based on the current model's accuracy and its confidence in the prediction. Experimental results demonstrate that our approach can induce accurate models using substantially fewer feature-value acquisitions than random sampling.
    ML ID: 182
  11. An Expected Utility Approach to Active Feature-value Acquisition
    [Details] [PDF]
    P. Melville, M. Saar-Tsechansky, F. Provost and Raymond J. Mooney
    In Proceedings of the International Conference on Data Mining, 745-748, Houston, TX, November 2005.
    In many classification tasks training data have missing feature values that can be acquired at a cost. For building accurate predictive models, acquiring all missing values is often prohibitively expensive or unnecessary, while acquiring a random subset of feature values may not be most effective. The goal of active feature-value acquisition is to incrementally select feature values that are most cost-effective for improving the model's accuracy. We present an approach that acquires feature values for inducing a classification model based on an estimation of the expected improvement in model accuracy per unit cost. Experimental results demonstrate that our approach consistently reduces the cost of producing a model of a desired accuracy compared to random feature acquisitions.
    ML ID: 179
  12. Semi-supervised Clustering: Probabilistic Models, Algorithms and Experiments
    [Details] [PDF]
    Sugato Basu
    PhD Thesis, University of Texas at Austin, 2005.
    Clustering is one of the most common data mining tasks, used frequently for data categorization and analysis in both industry and academia. The focus of our research is on semi-supervised clustering, where we study how prior knowledge, gathered either from automated information sources or human supervision, can be incorporated into clustering algorithms. In this thesis, we present probabilistic models for semi-supervised clustering, develop algorithms based on these models and empirically validate their performances by extensive experiments on data sets from different domains, e.g., text analysis, hand-written character recognition, and bioinformatics.

    In many domains where clustering is applied, some prior knowledge is available either in the form of labeled data (specifying the category to which an instance belongs) or pairwise constraints on some of the instances (specifying whether two instances should be in same or different clusters). In this thesis, we first analyze effective methods of incorporating labeled supervision into prototype-based clustering algorithms, and propose two variants of the well-known KMeans algorithm that can improve their performance with limited labeled data.

    We then focus on the problem of semi-supervised clustering with constraints and show how this problem can be studied in the framework of a well-defined probabilistic generative model of a Hidden Markov Random Field. We derive an efficient KMeans-type iterative algorithm, HMRF-KMeans, for optimizing a semi-supervised clustering objective function defined on the HMRF model. We also give convergence guarantees of our algorithm for a large class of clustering distortion measures (e.g., squared Euclidean distance, KL divergence, and cosine distance).

    Finally, we develop an active learning algorithm for acquiring maximally informative pairwise constraints in an interactive query-driven framework, which to our knowledge is the first active learning algorithm for semi-supervised clustering with constraints.

    Other interesting problems of semi-supervised clustering that we discuss in this thesis include (1) semi-supervised graph-based clustering using kernels, (2) using prior knowledge to improve overlapping clustering of data, (3) integration of both constraint-based and distance-based semi-supervised clustering methods using the HMRF model, and (4) model selection techniques that use the available supervision to automatically select the right number of clusters.
    ML ID: 174
  13. Economical Active Feature-value Acquisition through Expected Utility Estimation
    [Details] [PDF]
    P. Melville, M. Saar-Tsechansky, F. Provost and Raymond J. Mooney
    In Proceedings of the KDD-05 Workshop on Utility-Based Data Mining, 10-16, Chicago, IL, August 2005.
    In many classification tasks training data have missing feature values that can be acquired at a cost. For building accurate predictive models, acquiring all missing values is often prohibitively expensive or unnecessary, while acquiring a random subset of feature values may not be most effective. The goal of Active Feature-value Acquisition is to incrementally select feature values that are most cost-effective for improving the model's accuracy. We present two policies, Sampled Expected Utility and Expected Utility-ES, that acquire feature values for inducing a classification model based on an estimation of the expected improvement in model accuracy per unit cost. A comparison of the two policies to each other and to alternative policies demonstrate that Sampled Expected Utility is preferable as it effectively reduces the cost of producing a model of a desired accuracy and exhibits a consistent performance across domains.
    ML ID: 168
  14. Active Learning for Probability Estimation using Jensen-Shannon Divergence
    [Details] [PDF]
    P. Melville, S. M. Yang, M. Saar-Tsechansky and Raymond J. Mooney
    In Proceedings of the 16th European Conference on Machine Learning, 268--279, Porto, Portugal, October 2005.
    Active selection of good training examples is an important approach to reducing data-collection costs in machine learning; however, most existing methods focus on maximizing classification accuracy. In many applications, such as those with unequal misclassification costs, producing good class probability estimates (CPEs) is more important than optimizing classification accuracy. We introduce novel variations of two extant active-learning algorithms, Boostrap-LV and ACTIVEDECORATE, by using Jensen-Shannon divergence (a similarity measure for probability distributions) to improve sample selection for optimizing CPEs. Comprehensive experimental results demonstrate the benefits of our enhancements.
    ML ID: 161
  15. Active Feature-Value Acquisition for Classifier Induction
    [Details] [PDF]
    Prem Melville, Maytal Saar-Tsechansky, Foster Provost, and Raymond J. Mooney
    Technical Report UT-AI-TR-04-311, Artificial Intelligence Lab, University of Texas at Austin, February 2004.
    Many induction problems, such as on-line customer profiling, include missing data that can be acquired at a cost, such as incomplete customer information that can be filled in by an intermediary. For building accurate predictive models, acquiring complete information for all instances is often prohibitively expensive or unnecessary. Randomly selecting instances for feature acquisition allows a representative sampling, but does not incorporate other value estimations of acquisition. Active feature-value acquisition aims at reducing the cost of achieving a desired model accuracy by identifying instances for which complete information is most informative to obtain. We present approaches in which instances are selected for feature acquisition based on the current model's ability to predict accurately and the model's confidence in its prediction. Experimental results on several real-world data sets demonstrate that our approach can induce accurate models using substantially fewer feature-value acquisitions as compared to a baseline policy and a previously-published approach.
    ML ID: 158
  16. Active Feature-Value Acquisition for Classifier Induction
    [Details] [PDF]
    Prem Melville, Maytal Saar-Tsechansky, Foster Provost, and Raymond J. Mooney
    In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM-2004), 483-486, Brighton, UK, November 2004.
    Many induction problems, such as on-line customer profiling, include missing data that can be acquired at a cost, such as incomplete customer information that can be filled in by an intermediary. For building accurate predictive models, acquiring complete information for all instances is often prohibitively expensive or unnecessary. Randomly selecting instances for feature acquisition allows a representative sampling, but does not incorporate other value estimations of acquisition. Active feature-value acquisition aims at reducing the cost of achieving a desired model accuracy by identifying instances for which complete information is most informative to obtain. We present approaches in which instances are selected for feature acquisition based on the current model's ability to predict accurately and the model's confidence in its prediction. Experimental results on several real-world data sets demonstrate that our approach can induce accurate models using substantially fewer feature-value acquisitions as compared to a baseline policy and a previously-published approach.
    ML ID: 157
  17. Diverse Ensembles for Active Learning
    [Details] [PDF]
    Prem Melville and Raymond J. Mooney
    In Proceedings of 21st International Conference on Machine Learning (ICML-2004), 584-591, Banff, Canada, July 2004.
    Query by Committee is an effective approach to selective sampling in which disagreement amongst an ensemble of hypotheses is used to select data for labeling. Query by Bagging and Query by Boosting are two practical implementations of this approach that use Bagging and Boosting, respectively, to build the committees. For effective active learning, it is critical that the committee be made up of consistent hypotheses that are very different from each other. DECORATE is a recently developed method that directly constructs such diverse committees using artificial training data. This paper introduces Active-Decorate, which uses Decorate committees to select good training examples. Extensive experimental results demonstrate that, in general, Active-DECORATE outperforms both Query by Bagging and Query by Boosting.
    ML ID: 146
  18. Active Semi-Supervision for Pairwise Constrained Clustering
    [Details] [PDF]
    Sugato Basu, Arindam Banerjee, and Raymond J. Mooney
    In Proceedings of the 2004 SIAM International Conference on Data Mining (SDM-04), April 2004.
    Semi-supervised clustering uses a small amount of supervised data to aid unsupervised learning. One typical approach specifies a limited number of must-link and cannot-link constraints between pairs of examples. This paper presents a pairwise constrained clustering framework and a new method for actively selecting informative pairwise constraints to get improved clustering performance. The clustering and active learning methods are both easily scalable to large datasets, and can handle very high dimensional data. Experimental and theoretical results confirm that this active querying of pairwise constraints significantly improves the accuracy of clustering when given a relatively small amount of supervision.
    ML ID: 141
  19. Active Learning for Natural Language Parsing and Information Extraction
    [Details] [PDF]
    Cynthia A. Thompson, Mary Elaine Califf and Raymond J. Mooney
    In Proceedings of the Sixteenth International Conference on Machine Learning (ICML-99), 406-414, Bled, Slovenia, June 1999.
    In natural language acquisition, it is difficult to gather the annotated data needed for supervised learning; however, unannotated data is fairly plentiful. Active learning methods attempt to select for annotation and training only the most informative examples, and therefore are potentially very useful in natural language applications. However, existing results for active learning have only considered standard classification tasks. To reduce annotation effort while maintaining accuracy, we apply active learning to two non-classification tasks in natural language processing: semantic parsing and information extraction. We show that active learning can significantly reduce the number of annotated examples required to achieve a given level of performance for these complex tasks.
    ML ID: 92