Data Mining: A Mathematical Perspective

CS 391D/CAM 395T

CS Unique No. 54950 / CAM Unique No. 66117

Fall 2009
TTh 9:30-11am
WEL 2.312

Instructor: Prof. Inderjit Dhillon (send email)
Office: ACES 2.332
Office Hours: Tue 11am-noon and by appointment
TA: Wei Tang (send email)
Office: TAY 137
Office Hours: MW 3:30-5:30pm

Course Description

Data mining is the automated discovery of interesting patterns and relationships in massive data sets. This graduate course will focus on various mathematical and statistical aspects of data mining. Topics covered include supervised methods (regression, classification) and unsupervised methods (clustering, principal components analysis, dimensionality reduction). The technical tools used in the course will draw from linear algebra, multivariate statistics and optimization. The main tools from these areas will be covered in class, but undergraduate level linear algebra is a pre-requisite (see below). A substantial portion of the course will focus on research projects, where students will choose a well defined research problem. Projects can vary in their theoretical/mathematical content, and in the implementation/programming involved. Projects will be conducted by teams of 2-3 students.

Pre-requisites: Basics (undergraduate level) of linear algebra (M341 or equivalent) and some mathematical sophistication.

Books

  • "Pattern Recognition and Machine Learning" by C. Bishop, Springer, 2006.
  • "Elements of Statistical Learning: Data Mining, Inference, and Prediction" by T. Hastie, R. Tibshirani, J. Friedman, Springer-Verlag, 2001.
  • "Pattern Classification" by R. Duda, P. Hart and D. Stork, John Wiley and Sons, 2000.
  • Reading Material

  • Matlab Tutorials
  • Linear Algebra Background
  • Class Lectures
  • Berkeley slides on Linear Dimensionality Reduction and Non-Linear Dimensionality Reduction
  • Homeworks

  • Hard-copies of your homework solutions should be submitted in class on the due date.
  • Homework 1, Solutions.
  • Homework 2, Solutions.
  • Homework 3, Solutions.
  • Homework 4
  • Class Presentations

  • Schedule
  • Suggestions for Paper Readings
  • Class Projects

  • Project Suggestions
  • Grading

  • 10 + 30 = 40% Class Project (First submission + Final submission)
  • 20% Homeworks
  • 25% Midterm
  • 10% Class presentation of a research paper or book chapter/section
  • 5% Class participation and attendance
  • Code of Conduct