Comparing and Unifying Search-Based and Similarity-Based Approaches to  Semi-Supervised Clustering

Comparing and Unifying Search-Based and Similarity-Based Approaches to Semi-Supervised Clustering (2003)

Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney

Semi-supervised clustering employs a small amount of labeled data to aid unsupervised learning. Previous work in the area has employed one of two approaches: 1) Search-based methods that utilize supervised data to guide the search for the best clustering, and 2) Similarity-based methods that use supervised data to adapt the underlying similarity metric used by the clustering algorithm. This paper presents a unified approach based on the K-Means clustering algorithm that incorporates both of these techniques. Experimental results demonstrate that the combined approach generally produces better clusters than either of the individual approaches.

View:

PDF, PS

Citation:

In Proceedings of the ICML-2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, pp. 42-49, Washington, DC 2003.

Bibtex:

People

Sugato Basu	Ph.D. Alumni	sugato [at] cs utexas edu
Mikhail Bilenko	Ph.D. Alumni	mbilenko [at] microsoft com
Raymond J. Mooney	Faculty	mooney [at] cs utexas edu

Areas of Interest

Machine Learning Semi-Supervised Learning

Labs

Machine Learning