UT ML Group: Unsupervised and Semi-Supervised Learning and Clustering

In many learning tasks, there is a large supply of unlabeled data but insufficient labeled data since it can be expensive to generate. Semi-supervised learning combines labeled and unlabeled data during training to improve performance. Semi-supervised learning is applicable to both classification and clustering. In supervised classification, there is a known, fixed set of categories and category-labeled training data is used to induce a classification function. In semi-supervised classification, training also exploits additional unlabeled data, frequently resulting in a more accurate classification function. In unsupervised clustering, an unlabeled dataset is partitioned into groups of similar examples, typically by optimizing an objective function that characterizes good partitions. In semi-supervised clustering , some labeled data is used along with the unlabeled data to obtain a better clustering.

Publications

  1. Semi-Supervised Learning for Semantic Parsing using Support Vector Machines [Abstract] [PDF]
    Rohit J. Kate and Raymond J. Mooney
    In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Short Papers (NAACL/HLT-2007), pp. 81-84, Rochester, NY, April 2007.

  2. Learnable Similarity Functions and Their Application to Record Linkage and Clustering [Abstract] [PDF]
    Mikhail Bilenko
    Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, August 2006.
    136 pages.

  3. Probabilistic Semi-Supervised Clustering with Constraints [Abstract] [PDF]
    Sugato Basu, Mikhail Bilenko, Arindam Banerjee and Raymond J. Mooney
    In Semi-Supervised Learning, O. Chapelle, B. Schoelkopf, and A. Zien (eds.), pp. 73-102, MIT Press, Cambridge, MA, 2006.

  4. Semi-supervised Clustering: Probabilistic Models, Algorithms and Experiments [Abstract] [PDF]
    Sugato Basu
    Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, May, 2005.
    157 pages

  5. Semi-supervised Graph Clustering: A Kernel Approach [Abstract] [PDF]
    Kulis, B., Basu, S., Dhillon, I., and Mooney, R.J.
    Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, pp. 457--464, August 2005. (Distinguished Student Paper Award)

  6. Model-based Overlapping Clustering [Abstract] [PDF]
    Banerjee, A., Krumpelman, C., Basu, S., Mooney, R.J., and Ghosh, J.
    Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, pp. 532--537, August 2005.

  7. A Probabilistic Framework for Semi-Supervised Clustering [Abstract] [PDF]
    Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney
    Best Research Paper Award
    Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), pp. 59-68, Seattle, WA, August 2004.

  8. Semi-supervised Clustering with Limited Background Knowledge [Abstract] [PDF]
    Sugato Basu
    Proceedings of the Ninth AAAI/SIGART Doctoral Consortium, pp. 979-980, San Jose, CA, July 2004.

  9. Learnable Similarity Functions and Their Applications to Clustering and Record Linkage [Abstract] [PDF]
    Mikhail Bilenko
    Proceedings of the Ninth AAAI/SIGART Doctoral Consortium, pp. 981-982, San Jose, CA, July 2004.

  10. Integrating Constraints and Metric Learning in Semi-Supervised Clustering [Abstract] [PDF]
    Mikhail Bilenko, Sugato Basu, and Raymond J. Mooney
    Proceedings of the 21st International Conference on Machine Learning (ICML-2004), pp. 81-88, Banff, Canada, July 2004.

  11. A Comparison of Inference Techniques for Semi-supervised Clustering with Hidden Markov Random Fields [Abstract] [PDF]
    Mikhail Bilenko and Sugato Basu
    Proceedings of the ICML-2004 Workshop on Statistical Relational Learning and its Connections to Other Fields (SRL-2004), Banff, Canada, July 2004.

  12. Active Semi-Supervision for Pairwise Constrained Clustering [Abstract] [PDF]
    Sugato Basu, Arindam Banerjee, and Raymond J. Mooney
    Proceedings of the SIAM International Conference on Data Mining (SDM-2004), pp. 333-344, Lake Buena Vista, FL, April 2004.

  13. Semisupervised Clustering for Intelligent User Management [Abstract] [PDF]
    Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney
    Proceedings of the IBM Austin Center for Advanced Studies 5th Annual Austin CAS Conference, Austin, TX, February 2004.

  14. Semi-supervised Clustering: Learning with Limited User Feedback [Abstract] [PDF]
    Sugato Basu
    Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, November 2003.
    47 pages.
    Also appears as Technical Report UT-AI-TR-03-307, Artificial Intelligence Lab, University of Texas at Austin, January 2004.

  15. Learnable Similarity Functions and Their Applications to Record Linkage and Clustering [Abstract] [PDF]
    Mikhail Bilenko
    Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, October 2003.
    47 pages.
    Also appears as Technical Report UT-AI-TR-03-305, Artificial Intelligence Lab, University of Texas at Austin, December 2003.

  16. Comparing and Unifying Search-Based and Similarity-Based Approaches to Semi-Supervised Clustering [Abstract] [PDF]
    Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney
    Proceedings of the ICML-2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, pp. 42-49, Washington DC, August 2003.

  17. Semi-supervised Clustering by Seeding [Abstract] [PDF]
    Sugato Basu, Arindam Banerjee, and Raymond J. Mooney
    Proceedings of the Nineteenth International Conference on Machine Learning (ICML-2002), pp. 19-26, Sydney, Australia, July 2002.


mooney@cs.utexas.edu