## Scalable Machine Learning

## CS 395T

### Unique No. 53290

Fall 2014

Mon 2-5pm

GAR 0.132

Instructor: Prof. Inderjit Dhillon
(send email)

Office: GDC 4.704

Office Hours: Tue 2-3pm and by appointment

TA: Nagarajan Natarajan
(send email)

Office: GDC 4.802A

Office Hours: Mon 10am-noon, Fri 3-4:30pm
### Course Description

This is an **advanced project-based research-oriented course** in machine learning for big data. The emphasis in the course will be on developing scalable/parallel algorithms and software for various machine learning tasks. A special emphasis will be on optimization algorithms and related software for machine learning. In addition to lectures on background material by the instructor, the course will lean heavily on discussions and paper presentations led by students. Students will be expected to actively participate in class discussions, especially those resulting from programming assignments and class projects. Topics covered/discussed are expected to be: regression, classification, clustering, dimensionality reduction, topic modeling, matrix completion, social network analysis, parallel programming, Hadoop, MapReduce, OpenMP, MPI, GraphLab, Galois, co-ordinate descent, stochastic gradient descent, first-order methods, Newton methods, etc. A substantial portion of the course will focus on research projects, where students will choose a well defined research problem. All projects are expected to involve a fair amount of implementation/programming (mostly on parallel machines), but some may lean more on their theoretical/mathematical content. Projects will be conducted by teams of two.

IMPORTANT: Students are expected to have taken either a graduate course in (a) machine learning or (b) parallel computing. Students are expected to have a solid background in linear algebra and optimization.

### Grading