Scalable Machine Learning

CS 395T

Unique No. 53290

Fall 2014
Mon 2-5pm
GAR 0.132

Instructor: Prof. Inderjit Dhillon (send email)
Office: GDC 4.704
Office Hours: Tue 2-3pm and by appointment

TA: Nagarajan Natarajan (send email)
Office: GDC 4.802A
Office Hours: Mon 10am-noon, Fri 3-4:30pm

Course Description

This is an advanced project-based research-oriented course in machine learning for big data. The emphasis in the course will be on developing scalable/parallel algorithms and software for various machine learning tasks. A special emphasis will be on optimization algorithms and related software for machine learning. In addition to lectures on background material by the instructor, the course will lean heavily on discussions and paper presentations led by students. Students will be expected to actively participate in class discussions, especially those resulting from programming assignments and class projects. Topics covered/discussed are expected to be: regression, classification, clustering, dimensionality reduction, topic modeling, matrix completion, social network analysis, parallel programming, Hadoop, MapReduce, OpenMP, MPI, GraphLab, Galois, co-ordinate descent, stochastic gradient descent, first-order methods, Newton methods, etc. A substantial portion of the course will focus on research projects, where students will choose a well defined research problem. All projects are expected to involve a fair amount of implementation/programming (mostly on parallel machines), but some may lean more on their theoretical/mathematical content. Projects will be conducted by teams of two.

IMPORTANT: Students are expected to have taken either a graduate course in (a) machine learning or (b) parallel computing. Students are expected to have a solid background in linear algebra and optimization.

Grading

  • 10 + 40 = 50% Class Project (First submission + Final submission)
  • 30% Homeworks
  • 15% Class Presentations
  • 5% Class participation and attendance
  • Code of Conduct