Department of Computer Science

Machine Learning Research Group

University of Texas at Austin Artificial Intelligence Lab

Publications: Semi-Supervised Learning

In many learning tasks, there is a large supply of unlabeled data but insufficient labeled data since it can be expensive to generate. Semi-supervised learning combines labeled and unlabeled data during training to improve performance. Semi-supervised learning is applicable to both classification and clustering. In supervised classification, there is a known, fixed set of categories and category-labeled training data is used to induce a classification function. In semi-supervised classification, training also exploits additional unlabeled data, frequently resulting in a more accurate classification function. In semi-supervised clustering, some labeled data is used along with the unlabeled data to obtain a better clustering.
  1. Dialog as a Vehicle for Lifelong Learning of Grounded Language Understanding Systems
    [Details] [PDF] [Slides (PDF)]
    Aishwarya Padmakumar
    PhD Thesis, Department of Computer Science, The University of Texas at Austin, August 2020.
  2. Interaction and Autonomy in RoboCup@Home and Building-Wide Intelligence
    [Details] [PDF]
    Justin Hart, Harel Yedidsion, Yuqian Jiang, Nick Walker, Rishi Shah, Jesse Thomason, Aishwarya Padmakumar, Rolando Fernandez, Jivko Sinapov, Raymond Mooney, Peter Stone
    In Artificial Intelligence (AI) for Human-Robot Interaction (HRI) symposium, AAAI Fall Symposium Series, Arlington, Virginia, October 2018.
  3. Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog
    [Details] [PDF]
    Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Nick Walker, Yuqian Jiang, Harel Yedidsion, Justin Hart, Peter Stone, and Raymond J. Mooney
    In Late-breaking Track at the SIGDIAL Special Session on Physically Situated Dialogue (RoboDIAL-18), Melbourne, Australia, July 2018.
  4. Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog
    [Details] [PDF]
    Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Nick Walker, Yuqian Jiang, Harel Yedidsion, Justin Hart, Peter Stone, and Raymond J. Mooney
    In RSS Workshop on Models and Representations for Natural Human-Robot Communication (MRHRC-18). Robotics: Science and Systems (RSS), June 2018.
  5. Continually Improving Grounded Natural Language Understanding through Human-Robot Dialog
    [Details] [PDF]
    Jesse Thomason
    PhD Thesis, Department of Computer Science, The University of Texas at Austin, April 2018.
  6. Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions
    [Details] [PDF]
    Jesse Thomason, Jivko Sinapov, Raymond Mooney, Peter Stone
    In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) , February 2018.
  7. Improving Black-box Speech Recognition using Semantic Parsing
    [Details] [PDF] [Poster]
    Rodolfo Corona and Jesse Thomason and Raymond J. Mooney
    In Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP-17), 122-127, Taipei, Taiwan, November 2017.
  8. Knowledge Transfer Using Latent Variable Models
    [Details] [PDF] [Slides (PDF)]
    Ayan Acharya
    PhD Thesis, Department of Electrical and Computer Engineering, The University of Texas at Austin, August 2015.
  9. Inducing Grammars from Linguistic Universals and Realistic Amounts of Supervision
    [Details] [PDF]
    Dan Garrette
    PhD Thesis, Department of Computer Science, The University of Texas at Austin, 2015.
  10. A Supertag-Context Model for Weakly-Supervised CCG Parser Learning
    [Details] [PDF] [Slides (PDF)]
    Dan Garrette and Chris Dyer and Jason Baldridge and Noah A. Smith
    In Proceedings of the 2015 Conference on Computational Natural Language Learning (CoNLL-2015), 22--31, Beijing, China, 2015.
  11. Weakly-Supervised Grammar-Informed Bayesian CCG Parser Learning
    [Details] [PDF] [Slides (PDF)]
    Dan Garrette, Chris Dyer, Jason Baldridge, Noah A. Smith
    In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15), Austin, TX, January 2015.
  12. Weakly-Supervised Bayesian Learning of a CCG Supertagger
    [Details] [PDF] [Slides (PDF)] [Poster]
    Dan Garrette and Chris Dyer and Jason Baldridge and Noah A. Smith
    In Proceedings of the Eighteenth Conference on Computational Natural Language Learning (CoNLL-2014), 141--150, Baltimore, MD, June 2014.
  13. Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages
    [Details] [PDF]
    Dan Garrette and Jason Mielens and Jason Baldridge
    In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL-2013), 583--592, Sofia, Bulgaria, August 2013.
  14. Learning a Part-of-Speech Tagger from Two Hours of Annotation
    [Details] [PDF] [Slides (PDF)] [Video]
    Dan Garrette, Jason Baldridge
    In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT-13), 138--147, Atlanta, GA, June 2013.
  15. Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries
    [Details] [PDF]
    Dan Garrette and Jason Baldridge
    In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012), 821--831, Jeju, Korea, July 2012.
  16. Semi-supervised graph clustering: a kernel approach
    [Details] [PDF]
    Brian Kulis, Sugato Basu, Inderjit Dhillon, and Raymond Mooney
    Machine Learning Journal, 74(1):1-22, 2009.
  17. Watch, Listen & Learn: Co-training on Captioned Images and Videos
    [Details] [PDF]
    Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney
    In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 457--472, Antwerp Belgium, September 2008.
  18. Semi-Supervised Learning for Semantic Parsing using Support Vector Machines
    [Details] [PDF] [Slides (PPT)]
    Rohit J. Kate and Raymond J. Mooney
    In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Short Papers (NAACL/HLT-2007), 81--84, Rochester, NY, April 2007.
  19. Learnable Similarity Functions and Their Application to Record Linkage and Clustering
    [Details] [PDF]
    Mikhail Bilenko
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, August 2006. 136 pages.
  20. Probabilistic Semi-Supervised Clustering with Constraints
    [Details] [PDF]
    Sugato Basu, Mikhail Bilenko, Arindam Banerjee and Raymond J. Mooney
    In O. Chapelle and B. Sch{"{o}}lkopf and A. Zien, editors, Semi-Supervised Learning, Cambridge, MA, 2006. MIT Press.
  21. Semi-supervised Clustering: Probabilistic Models, Algorithms and Experiments
    [Details] [PDF]
    Sugato Basu
    PhD Thesis, University of Texas at Austin, 2005.
  22. Semi-supervised Graph Clustering: A Kernel Approach
    [Details] [PDF]
    B. Kulis, S. Basu, I. Dhillon and Raymond J. Mooney
    In Proceedings of the 22nd International Conference on Machine Learning, 457--464, Bonn, Germany, August 2005. (Distinguished Student Paper Award).
  23. A Probabilistic Framework for Semi-Supervised Clustering
    [Details] [PDF]
    Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney
    In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), 59-68, Seattle, WA, August 2004.
  24. Semi-supervised Clustering with Limited Background Knowledge
    [Details] [PDF]
    Sugato Basu
    In Proceedings of the Ninth AAAI/SIGART Doctoral Consortium, 979--980, San Jose, CA, July 2004.
  25. Learnable Similarity Functions and Their Applications to Clustering and Record Linkage
    [Details] [PDF]
    Mikhail Bilenko
    In Proceedings of the Ninth AAAI/SIGART Doctoral Consortium, 981--982, San Jose, CA, July 2004.
  26. Integrating Constraints and Metric Learning in Semi-Supervised Clustering
    [Details] [PDF]
    Mikhail Bilenko, Sugato Basu, and Raymond J. Mooney
    In Proceedings of 21st International Conference on Machine Learning (ICML-2004), 81-88, Banff, Canada, July 2004.
  27. A Comparison of Inference Techniques for Semi-supervised Clustering with Hidden Markov Random Fields
    [Details] [PDF]
    Mikhail Bilenko and Sugato Basu
    In Proceedings of the ICML-2004 Workshop on Statistical Relational Learning and its Connections to Other Fields (SRL-2004), Banff, Canada, July 2004.
  28. Active Semi-Supervision for Pairwise Constrained Clustering
    [Details] [PDF]
    Sugato Basu, Arindam Banerjee, and Raymond J. Mooney
    In Proceedings of the 2004 SIAM International Conference on Data Mining (SDM-04), April 2004.
  29. Semisupervised Clustering for Intelligent User Management
    [Details] [PDF]
    Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney
    In Proceedings of the IBM Austin Center for Advanced Studies 5th Annual Austin CAS Conference, Austin, TX, February 2004.
  30. Semi-supervised Clustering: Learning with Limited User Feedback
    [Details] [PDF]
    Sugato Basu
    Technical Report, Cornell University, 2004.
  31. Learnable Similarity Functions and Their Applications to Record Linkage and Clustering
    [Details] [PDF]
    Mikhail Bilenko
    2003. Doctoral Dissertation Proposal, University of Texas at Austin.
  32. Comparing and Unifying Search-Based and Similarity-Based Approaches to Semi-Supervised Clustering
    [Details] [PDF]
    Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney
    In Proceedings of the ICML-2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, 42-49, Washington, DC, 2003.
  33. Semi-supervised Clustering by Seeding
    [Details] [PDF]
    Sugato Basu, Arindam Banerjee, and Raymond J. Mooney
    In Proceedings of 19th International Conference on Machine Learning (ICML-2002), 19-26, 2002.