Fall 2003 CS395T / CAM395T "Large-Scale Data Mining" M-W 4-5:30pm GEO 2.102 Prof. Inderjit Dhillon This is a project-oriented seminar course in data mining, with emphasis on text data mining and bioinformatics. Topics covered will include algorithms for (a) web search using link analysis (Google, HITS), (b) clustering, classification (d) factor analysis / low-rank approximations, and (e) multidimensional visualization. The technical tools used in the course will come from: linear algebra, multivariate statistics, information theory and optimization. The concepts used from these technical areas will be covered in class, but undergraduate level linear algebra is a pre-requisite. A substantial portion of the course will concentrate on research projects, where students will choose a well-defined problem. Students will have freedom in choosing projects, as long they are related to data mining. Projects can vary in their theoretical/mathematical content, and in the amount of implementation/programming involved. Projects will be conducted by teams of up to 2-3 students.