Representative Selection in Nonmetric Datasets (2015)
Elad Liebman, Benny Chor, and Peter Stone
This study considers the problem of representative selection: choosing a subset of data points from a dataset that best represents its overall set of elements. This subset needs to inherently reflect the type of information contained in the entire set, while minimizing redundancy. For such purposes, clustering might seem like a natural approach. However, existing clustering methods are not ideally suited for representative selection, especially when dealing with nonmetric data, in which only a pairwise similarity measure exists. In this article we propose delta-medoids, a novel approach that can be viewed as an extension of the k-medoids algorithm and is specifically suited for sample representative selection from nonmetric data. We empirically validate delta-medoids in two domains: music analysis and motion analysis. We also show some theoretical bounds on the performance of delta-medoids and the hardness of representative selection in general.
Applied Artificial Intelligence, Vol. 29, 8 (2015), pp. 807--838.

Elad Liebman Ph.D. Student eladlieb [at] cs utexas edu
Peter Stone Faculty pstone [at] cs utexas edu