
Inderjit S. Dhillon's Software
Software
 Distance Metric
Learning Software

ITML
(Version 1.1)
is a Matlab implementation of the Information
Theoretic Metric Learning algorithm. Metric learning involves
finding a suitable metric for a given set of datapoints with
sideinformation regarding distances between few datapoints. ITML
characterizes the metric using a Mahalanobis distance function and
learns the associated parameters using Bregman's cyclic projection
algorithm.
 Graph Clustering Software

Graclus
(Version 1.0)
is a fast graph clustering software that computes normalized cut and ratio association for a given graph without any eigenvector computation. This is possible because we establish a mathematical equivalence between general cut or association objectives (including normalized cut and ratio association) and the weighted kernel kmeans objective. One important implication of this equivalence is that we can run a kmeans type of iterative algorithm to minimize general cut or association objectives. Therefore unlike spectral methods, our algorithm totally avoids timeconsuming eigenvector computation. We embed the weighted kernel kmeans algorithm in a multilevel framework and develop this fast software for graph clustering.
 CoClustering Software

Cocluster
(Version 1.1) is a C++ program which implements three coclustering algorithms: informationtheoretic coclustering algorithm and two types of minimum sumsquared residue coclustering algorithms. In our implementation, all the algorithms have the pingpong structure, i.e., a batch algorithm followed by corresponding chain of first variations. Each algorithm also has five variations, based on in what order to update the row or column centroids.
 Clustering Software

Gmeans
is a C++ program for clustering. At the heart of the program is the Kmeans clustering algorithm with four different distance (similarity) measures, six various initialization methods, and a powerful local search strategy called first variation.
 Visualization Software

CViz
is a visualization tool designed for analyzing highdimensional data (data with many elements) in large, complex data sets. CViz easily loads the data sets, displays the most important factors relating clusters of records, and provides fullmotion visualization of the inherent data clusters.


