CS 395T- Large-scale Data Mining

Homework 2


    The main goal of this homework is to experiment with some clustering techniques.

    Answer the following questions:

    1. Report your clustering results using the various techniques on classic3, the 300 document set and cmu.news 20_cleaned. For each clustering, submit the confusion matrix and objective function value (if available).
    2. What is the number of clusters output by spmeans for each of the data sets? Is it 3 for the classic3 data?
    3. Are your clustering results good? If not, explain why.
    4. In your opinion which of the clustering techniques is the best? Why?