Inderjit S. Dhillon's Software
- Distance Metric
is a Matlab implementation of the Information
Theoretic Metric Learning algorithm. Metric learning involves
finding a suitable metric for a given set of data-points with
side-information regarding distances between few datapoints. ITML
characterizes the metric using a Mahalanobis distance function and
learns the associated parameters using Bregman's cyclic projection
- Graph Clustering Software
is a fast graph clustering software that computes normalized cut and ratio association for a given graph without any eigenvector computation. This is possible because we establish a mathematical equivalence between general cut or association objectives (including normalized cut and ratio association) and the weighted kernel k-means objective. One important implication of this equivalence is that we can run a k-means type of iterative algorithm to minimize general cut or association objectives. Therefore unlike spectral methods, our algorithm totally avoids time-consuming eigenvector computation. We embed the weighted kernel k-means algorithm in a multilevel framework and develop this fast software for graph clustering.
- Co-Clustering Software
(Version 1.1) is a C++ program which implements three co-clustering algorithms: information-theoretic co-clustering algorithm and two types of minimum sum-squared residue co-clustering algorithms. In our implementation, all the algorithms have the ping-pong structure, i.e., a batch algorithm followed by corresponding chain of first variations. Each algorithm also has five variations, based on in what order to update the row or column centroids.
- Clustering Software
is a C++ program for clustering. At the heart of the program is the K-means clustering algorithm with four different distance (similarity) measures, six various initialization methods, and a powerful local search strategy called first variation.
- Visualization Software
is a visualization tool designed for analyzing high-dimensional data (data with many elements) in large, complex data sets. CViz easily loads the data sets, displays the most important factors relating clusters of records, and provides full-motion visualization of the inherent data clusters.