27 |
Course
overview: Parallel architectures, parallel algorithms, parallel data
structures Slides: Introduction to CS 395T Readings: (1) Moore's Law paper, Electronics, 1965. (2) Static Power Model for Architects, Butts and Sohi, Micro 2000. (3) Introduction to the Cell processor, Kahle et al, IBM J.Res&Dev, July 2005 (4) Amorphous Data-Parallelism, Pingali et al., 2009 |
September
1 | Algorithms (I):
Parallelism in Computational Science Algorithms (a) Ordinary differential equations (ode's), finite-differences, systems of ode's Presenter: Keshav Pingali Slides: Some computational science algorithms Readings: (1) Mathematica tutorial on numerical methods for solving pde's |
3 | Algorithms (II):
Parallelism in Computational Science Algorithms (b) Partial differential equations (pde's), linear system solvers, finite-element method Presenter: Keshav Pingali Slides: see September 1 lecture |
8 |
Algorithms
(III): Parallelism in Irregular algorithms (a) Amorphous data-parallelism, N-body methods, mesh generation, mesh refinement Presenters: Amber Hassan and Xin Sui Slides: Introduction to Irregular Algorithms Readings: (1) Data parallel algorithms, Hillis and Steele, CACM, 1986 (2) Amorphous Data-Parallelism, Pingali et al., 2009 |
10 |
Algorithms
(IV): Parallelism in Irregular algorithms (b) Mesh refinement, Maxflow algorithms, event-driven simulation Presenters: Amber Hassan and Xin Sui Slides: (1) Barnes-Hut (2) Mesh Generation and Graph Partitioning (3) Preflow Push |
15 |
Abstractions
for regular algorithms and machines: Dependence graphs,data
dependences,control
dependences, PRAM model, DAG scheduling Presenter: Keshav Pingali Slides: Algorithm and machine abstractions: dependence graphs and PRAM model Control dependence computation Readings: (1) Dependence graphs and compiler optimizations, Kuck et al., POPL 1981 (2) The program dependence graph and its use in optimization, Ferrante, Ottenstein,Warren, TOPLAS, 1987 (3) Optimal control dependence computation, Pingali and Bilardi, TOPLAS, 1997 (4) Experimental evaluation of list scheduling, Cooper et al, Rice TR, 1998 (5) From control flow to dataflow, Beck et al., JPDC 1989 |
17 |
Abstractions for
irregular algorithms and machines: halographs, optimistic execution of
programs, dynamic scheduling Presenter: Donald Nguyen Slides: see amorphous data-parallelism slides |
22 |
Architecture (I):
Multicore architectures, cache coherence Presenters: Manish Arora,Mrinal Deo Slides: Coherent caches |
24 |
Architecture
(II): Locks, lock-free synchronization, memory consistency models Presenter: Ivan Jibaja Slides: Memory consistency models |
29 | Dynamic load-balancing Presenters: Rashid Kaleem, Amber Hassan Slides: Dynamic load-balancing Readings: (1)Load Balancing literature survey (2) Scheduling multi-threaded computations by work-stealing, Blumofe and Leiserson, JACM, 1999. |
October
1 | Parallel
data structures(I):
Lock/wait-free data structures Presenter: Augustine Matthews Slides: Locks and lock-free synchronization |
6 |
Parallel
data structures(II): Galois
data structures, array and
graph partitioning Presenter: Donald Nguyen Readings: (1) An efficient heuristic procedure for partitioning graphs, Kernighan and Lin, Bell System Technical Journal, 1970. (2) A fast and high quality multilevel scheme etc. Karypis and Kumar, SIAM J. Sci. Comput. 1998. |
8 | Parallel data
structures(III): Transactional
memory Presenters:Srivastava Daruru, Saurabh Shukla Readings: (1)Software Transactional Memory, Nir Shavit, Dan Touitou, PODC 1995 (2) Transactional Memory Architectural Support for Lock-Free Data Structures, Maurice Herlihy, J. Eliot B. Moss ISCA 1993. |
13 | Locality(I): Temporal
and spatial locality in algorithms, blocking, unit-stride accesses Presenter: Keshav Pingali Slides: Cache models for locality Readings: (1) Evaluation techniques for storage hierarchies, Mattson et al, IBM Systems Journal, 1970. |
15 | Locality(II): Case
studies: MMM, matrix factorization, stencil codes Presenter:Keshav Pingali Readings: (1) Anatomy of high-performance matrix multiplication, Goto et al, ACM TOMS, May 2008. (2) Optimizing matrix multiply using PHiPAC, Biles et al, LAPACK Working Note 111. |
20 | Locality(III):
Cache-oblivious algorithms Presenter:Keshav Pingali Readings: (1) Cache-oblivious algorithms, Frigo et al, FOCS 99 (2) An experimental comparison of cache-oblivious and cache-conscious programs, Yotov et al, SPAA 2007 |
22 | Compiler
analysis and transformation (I): Integer linear programming, dependence
analysis of dense array programs Presenter: Keshav Pingali Readings: (1) The Omega test, Pugh, Supercomputing 91 |
27 | Compiler
analysis and transformation (II): Loop transformations of dense array
programs Presenter:Keshav Pingali |
29 | Compiler analysis and
transformation (III): Points-to and shape analysis Presenter:Dimitrios Prountzos Slides: Analysis of programs with pointers Readings: (1) Tutorial on points-to analysis, Michael Hind |
November
3 |
Performance
modeling: PRAM, BPRAM, logP Presenter:Keshav Pingali |
5 |
Auto-tuning (I):
ATLAS, FFTW Presenter:Keshav Pingali Slides: Optimizing MMM and the ATLAS code generator Readings: (1) Is search really necessary to generate high-performance BLAS?, Yotov et al, Proceedings of IEEE, March 2005. |
10 |
Auto-tuning
(II): Machine learning techniques for program optimization Presenters:Amin Shali |
12 |
Special topics: GPU
programming Presenters:Apollo Ellis Readings: (1) A survey of general-purpose computation on graphics hardware, Owens et al, Eurographics 2005. |
17 |
Parallel
language/library case studies (I): MPI Presenter:Keshav Pingali Slides: Introduction to MPI , Advanced MPI Readings: (1) MPI Groups and Topologies, J. Squyers, ClusterWorld 2004 |
19 |
Parallel
language/library case studies (II): PGAS languages Presenters:TBD |
24 |
Parallel
language/library case studies (III): Cilk, TBB, Map-reduce Presenters:Sangmin Lee, Yang Wang Readings: (1) Cilk, an efficient multithreaded runtime system, Blumofe et al, PPoPP 1995 |
1 | Parallel
language/library case studies (IV): functional languages and dataflow Presenter:Keshav Pingali |
3 | Research directions in
parallel programming Presenter:Keshav Pingali |