CS 378: Lecture schedule

27

Course overview: Parallel architectures, parallel algorithms, parallel data structures
Slides: Introduction to CS 395T
Readings:
(1) Moore's Law paper, Electronics, 1965.
(2) Static Power Model for Architects, Butts and Sohi, Micro 2000.
(3) Introduction to the Cell processor, Kahle et al, IBM J.Res&Dev, July 2005
(4) Amorphous Data-Parallelism, Pingali et al., 2009

1	Algorithms (I): Parallelism in Computational Science Algorithms (a) Ordinary differential equations (ode's), finite-differences, systems of ode's Presenter: Keshav Pingali Slides: Some computational science algorithms Readings: (1) Mathematica tutorial on numerical methods for solving pde's
3	Algorithms (II): Parallelism in Computational Science Algorithms (b) Partial differential equations (pde's), linear system solvers, finite-element method Presenter: Keshav Pingali Slides: see September 1 lecture
8	Algorithms (III): Parallelism in Irregular algorithms (a) Amorphous data-parallelism, N-body methods, mesh generation, mesh refinement Presenters: Amber Hassan and Xin Sui Slides: Introduction to Irregular Algorithms Readings: (1) Data parallel algorithms, Hillis and Steele, CACM, 1986 (2) Amorphous Data-Parallelism, Pingali et al., 2009
10	Algorithms (IV): Parallelism in Irregular algorithms (b) Mesh refinement, Maxflow algorithms, event-driven simulation Presenters: Amber Hassan and Xin Sui Slides: (1) Barnes-Hut (2) Mesh Generation and Graph Partitioning (3) Preflow Push
15	Abstractions for regular algorithms and machines: Dependence graphs,data dependences,control dependences, PRAM model, DAG scheduling Presenter: Keshav Pingali Slides: Algorithm and machine abstractions: dependence graphs and PRAM model Control dependence computation Readings: (1) Dependence graphs and compiler optimizations, Kuck et al., POPL 1981 (2) The program dependence graph and its use in optimization, Ferrante, Ottenstein,Warren, TOPLAS, 1987 (3) Optimal control dependence computation, Pingali and Bilardi, TOPLAS, 1997 (4) Experimental evaluation of list scheduling, Cooper et al, Rice TR, 1998 (5) From control flow to dataflow, Beck et al., JPDC 1989
17	Abstractions for irregular algorithms and machines: halographs, optimistic execution of programs, dynamic scheduling Presenter: Donald Nguyen Slides: see amorphous data-parallelism slides
22	Architecture (I): Multicore architectures, cache coherence Presenters: Manish Arora,Mrinal Deo Slides: Coherent caches
24	Architecture (II): Locks, lock-free synchronization, memory consistency models Presenter: Ivan Jibaja Slides: Memory consistency models
29	Dynamic load-balancing Presenters: Rashid Kaleem, Amber Hassan Slides: Dynamic load-balancing Readings: (1)Load Balancing literature survey (2) Scheduling multi-threaded computations by work-stealing, Blumofe and Leiserson, JACM, 1999.

1	Parallel data structures(I): Lock/wait-free data structures Presenter: Augustine Matthews Slides: Locks and lock-free synchronization
6	Parallel data structures(II): Galois data structures, array and graph partitioning Presenter: Donald Nguyen Readings: (1) An efficient heuristic procedure for partitioning graphs, Kernighan and Lin, Bell System Technical Journal, 1970. (2) A fast and high quality multilevel scheme etc. Karypis and Kumar, SIAM J. Sci. Comput. 1998.
8	Parallel data structures(III): Transactional memory Presenters:Srivastava Daruru, Saurabh Shukla Readings: (1)Software Transactional Memory, Nir Shavit, Dan Touitou, PODC 1995 (2) Transactional Memory Architectural Support for Lock-Free Data Structures, Maurice Herlihy, J. Eliot B. Moss ISCA 1993.
13	Locality(I): Temporal and spatial locality in algorithms, blocking, unit-stride accesses Presenter: Keshav Pingali Slides: Cache models for locality Readings: (1) Evaluation techniques for storage hierarchies, Mattson et al, IBM Systems Journal, 1970.
15	Locality(II): Case studies: MMM, matrix factorization, stencil codes Presenter:Keshav Pingali Readings: (1) Anatomy of high-performance matrix multiplication, Goto et al, ACM TOMS, May 2008. (2) Optimizing matrix multiply using PHiPAC, Biles et al, LAPACK Working Note 111.
20	Locality(III): Cache-oblivious algorithms Presenter:Keshav Pingali Readings: (1) Cache-oblivious algorithms, Frigo et al, FOCS 99 (2) An experimental comparison of cache-oblivious and cache-conscious programs, Yotov et al, SPAA 2007
22	Compiler analysis and transformation (I): Integer linear programming, dependence analysis of dense array programs Presenter: Keshav Pingali Readings: (1) The Omega test, Pugh, Supercomputing 91
27	Compiler analysis and transformation (II): Loop transformations of dense array programs Presenter:Keshav Pingali
29	Compiler analysis and transformation (III): Points-to and shape analysis Presenter:Dimitrios Prountzos Slides: Analysis of programs with pointers Readings: (1) Tutorial on points-to analysis, Michael Hind

November

3	Performance modeling: PRAM, BPRAM, logP Presenter:Keshav Pingali
5	Auto-tuning (I): ATLAS, FFTW Presenter:Keshav Pingali Slides: Optimizing MMM and the ATLAS code generator Readings: (1) Is search really necessary to generate high-performance BLAS?, Yotov et al, Proceedings of IEEE, March 2005.
10	Auto-tuning (II): Machine learning techniques for program optimization Presenters:Amin Shali
12	Special topics: GPU programming Presenters:Apollo Ellis Readings: (1) A survey of general-purpose computation on graphics hardware, Owens et al, Eurographics 2005.
17	Parallel language/library case studies (I): MPI Presenter:Keshav Pingali Slides: Introduction to MPI , Advanced MPI Readings: (1) MPI Groups and Topologies, J. Squyers, ClusterWorld 2004
19	Parallel language/library case studies (II): PGAS languages Presenters:TBD
24	Parallel language/library case studies (III): Cilk, TBB, Map-reduce Presenters:Sangmin Lee, Yang Wang Readings: (1) Cilk, an efficient multithreaded runtime system, Blumofe et al, PPoPP 1995

1	Parallel language/library case studies (IV): functional languages and dataflow Presenter:Keshav Pingali
3	Research directions in parallel programming Presenter:Keshav Pingali

Lecture Schedule