Course Description
How do we make parallel programming mainstream? This is one of the most important problems facing computer systems researchers today, and solving it is the key to unlocking the performance potential of multicore processors. This seminar course focuses on recent breakthroughs that attempt to address this problem by exploiting the deep structure of parallelism and locality in algorithms. These ideas are useful not only for computational science applications but also for complex applications from other domains such as machine learning, big-data, and games.

This semester, the course will focus on (i) how to exploit parallelism for machine learning and big-data applications, and (ii) how to exploit approximation to reduce power and energy consumption. There is a lot of current research in both the systems and machine learning communities on these topics, and a variety of domain-specific languages (DSLs) and implementations for these domains have been proposed recently for both shared-memory and distributed-memory architectures.

Topics include the following:
  1. Structure of parallelism and locality in important algorithms in computational science and machine learning
  2. Algorithm abstractions: operator formulation of algorithms, dependence graphs
  3. Multicore architectures: interconnection networks, cache coherence, memory consistency models, synchronization
  4. Scheduling and load-balancing
  5. Parallel data structures: lock-free data structures, array/graph partitioning
  6. Memory hierarchies and locality, cache-oblivious algorithms
  7. Compiler analysis and transformations
  8. Performance models: PRAM, BPRAM, logP
  9. Self-optimizing software, auto-tuning
  10. GPUs and GPU programming
  11. Case studies: Cilk, MPI, OpenMP, Map-reduce, Galois, GraphLab
  12. Approximate computing for power and energy optimization

Students will present papers, participate in discussions, and do a substantial final project. The readings will include some of the classic papers in the field of parallel programming. In addition, there will be a small number of programming assignments and homeworks at the beginning of the semester. Some of the lectures in the course will be given by Inderjit Dhillon and Pradeep Ravikumar, who are experts in machine learning.

Prerequisites: programming maturity, knowledge of C/C++, basic courses on modern computer architecture and compilers

For basic material on computer architecture, read "Computer Architecture: A Quantitative Approach"
by Hennessy & Patterson, Morgan Kaufmann Publishers. For basic material on compilers, read "Optimizing Compilers for Modern Architectures" by Allen and Kennedy.

Lecture schedule and notes