CS 395T- Large-scale Data Mining
The main goal of this homework is to experiment
some clustering techniques.
paper to understand the spherical k-means algorithm.
Download the spkmeans
code and compile it.
- Download the data matrices here. Understand
Download the Java cluster
browser program that generates a sequence of web pages
your clustering results. (see a sample
Download the Metis graph partitioning software
and install it.
is a C program that transforms the matrix in CCS format to the input for Metis.
Run Metis on the same matrix used for spkmeans (note
this corresponds to a bipartite graph between words and documents).
Run Metis on the graph of documents, where an edge between two
has weight equal to their cosine similarity.
Write a hierarchical agglomerative clustering program in C/C++/Java and
then run it on the same matrix.
Download the spmeans
code and compile it. Read its documentation.
on the classic3 matrix.
Run the various clustering techniques on the matrix of 300 documents.
Run the various clustering techniques on cmu.news 20_cleaned.
Answer the following questions:
1. Report your clustering results using the
on classic3, the 300 document set and cmu.news 20_cleaned.
For each clustering, submit the confusion matrix and objective function
value (if available).
2. What is the number of clusters output by spmeans
for each of the data sets? Is it 3 for the classic3
3. Are your clustering results good? If not, explain
4. In your opinion which of the clustering
is the best? Why?