Inderjit S. Dhillon's Software Software


  • Distance Metric Learning Software
    • ITML (Version 1.1) is a Matlab implementation of the Information Theoretic Metric Learning algorithm. Metric learning involves finding a suitable metric for a given set of data-points with side-information regarding distances between few datapoints. ITML characterizes the metric using a Mahalanobis distance function and learns the associated parameters using Bregman's cyclic projection algorithm.

  • Graph Clustering Software
    • Graclus (Version 1.0) is a fast graph clustering software that computes normalized cut and ratio association for a given graph without any eigenvector computation. This is possible because we establish a mathematical equivalence between general cut or association objectives (including normalized cut and ratio association) and the weighted kernel k-means objective. One important implication of this equivalence is that we can run a k-means type of iterative algorithm to minimize general cut or association objectives. Therefore unlike spectral methods, our algorithm totally avoids time-consuming eigenvector computation. We embed the weighted kernel k-means algorithm in a multilevel framework and develop this fast software for graph clustering.

  • Co-Clustering Software
    • Co-cluster (Version 1.1) is a C++ program which implements three co-clustering algorithms: information-theoretic co-clustering algorithm and two types of minimum sum-squared residue co-clustering algorithms. In our implementation, all the algorithms have the ping-pong structure, i.e., a batch algorithm followed by corresponding chain of first variations. Each algorithm also has five variations, based on in what order to update the row or column centroids.

  • Clustering Software
    • Gmeans is a C++ program for clustering. At the heart of the program is the K-means clustering algorithm with four different distance (similarity) measures, six various initialization methods, and a powerful local search strategy called first variation.

  • Visualization Software
    • CViz is a visualization tool designed for analyzing high-dimensional data (data with many elements) in large, complex data sets. CViz easily loads the data sets, displays the most important factors relating clusters of records, and provides full-motion visualization of the inherent data clusters.


Department of Computer Science University of Texas at Austin