Comparing and Unifying Search-Based and Similarity-Based Approaches to Semi-Supervised Clustering (2003)
Semi-supervised clustering employs a small amount of labeled data to aid unsupervised learning. Previous work in the area has employed one of two approaches: 1) Search-based methods that utilize supervised data to guide the search for the best clustering, and 2) Similarity-based methods that use supervised data to adapt the underlying similarity metric used by the clustering algorithm. This paper presents a unified approach based on the K-Means clustering algorithm that incorporates both of these techniques. Experimental results demonstrate that the combined approach generally produces better clusters than either of the individual approaches.
In Proceedings of the ICML-2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, pp. 42-49, Washington, DC 2003.

Sugato Basu Ph.D. Alumni sugato [at] cs utexas edu
Mikhail Bilenko Ph.D. Alumni mbilenko [at] microsoft com
Raymond J. Mooney Faculty mooney [at] cs utexas edu