## Large-Scale Data Mining

## CS 395T

### Unique Number: 49460

### Course Announcement

Spring 2000

M-W 4:00-5:30pm

CPE 2.206

Professor: Inderjit Dhillon
(send email)

Office: Taylor Hall 5.148

Office Hours: Wed 10:00-11:00am

TA: Shailesh Kumar (send email)

Office: ENS 518

Office Hours: Thurs 10am-1pm

### Class Projects

### Handouts

### Relevant Books (on reserve in PCL)

### Lectures

### Material to be covered

Mathematical preliminaries - basics of linear algebra.
SVD (Singular Value Decomposition) and its use in indexing documents.
For example, Latent Semantic Indexing (LSI).
LSI page at Bellcore.
LSI page at Univ. of Tennessee, Knoxville.
Matrices, Vector Spaces and Information Retrieval by Michael W. Berry, Zlatko Drmac, Elizabeth R. Jessup.
Clustering algorithms (agglomerative clustering, graph-based algorithms, k-means).
Classification algorithms (linear discriminant analysis).
Focused Crawling of the WWW.
Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery by Soumen Chakrabarti, Martin van den Berg and Byron Dom.
Data Visualization (Self-Organizing Maps (SOMs), Class-Preserving Projections).
Class Visualization of High-Dimensional Data with Applications. by Inderjit Dhillon, Dharmendra Modha, Scott Spangler, 1999. Free Software is available here.
XGobi is a system for multivariate data visualization by Deborah Swayne, Di Cook, Andreas Buja at Bellcore. The same page contains XGvis that can draw discrete graphs using MDS(Multidimensional Scaling) and was developed by Andreas Buja, Deborah F. Swayne, Michael L. Littman, Nathaniel Dean. Free Software is available from the provided link.
WEBSOM can plot 2-d maps of tect documents using Kohonen's Self-Organizing Maps for Internet Exploration. The above link has a demo for visually browsing newsgroup data.
Support Vector Machines (SVMs) and their application to document classification.
Graph Partitioning with applications to Image Segmentation.
Lecture notes 1
& 2
on graph partitioning by Jim Demmel
Normalized Cuts and Image Segmentation by Jianbo Shi and Jitendra Malik.
Motion Segmentation and Tracking Using Normalized Cuts by Jianbo Shi and Jitendra Malik.
The METIS Graph Partitioning Package.
SVD in face recognition.
Papers and Faces Database by Larry Sirovich.
Eigenfaces and Face Recognition at the MIT Media Lab.
Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection by Peter Belhumeur and Jo Hespanha and David Kriegman, July 1997.
Analyzing the graph of the WWW (hubs and authorities, the CLEVER project at IBM, PageRank at Google)
Authoritative sources in a hyperlinked environment by Jon Kleinberg.
The CLEVER project at IBM Almaden.
Hypersearching the Web by Members of the CLEVER project.
### Related Courses

Stanford's
CS 349,
Data Mining, Search, and the World Wide Web, Fall 1998.
UC Berkeley's
CS 294-7, Large Datasets, Fall 1999.
UT Austin ECE course
EE 380L, A Practicum in Data Mining, Fall 1999.
Princeton's
CIS 700/702,
Information Retrieval, ?.