Lecture
Schedule
Background
Course overview
Basics of computer architecture:
pipelined and OOO execution processors
Another
useful set of slides on OOO processors
Lectures
from the ECE architecture course
Measurement
Measurements: timing and PAPI counters
Compilers
x86 ISA
and compilers
Sources of parallelism and locality in algorithms
Graph algorithms
Additional reading: The TAO of
Parallelism in Programs, Pingali et al, PLDI 2011.
Computational science
algorithms
Video
Caches
Cache architecture
and memory hierarchy
Basics of cache
organization (Lecture slides from Lin/McKinley)
Video
Locality, loop and data
transformations
Video
Case study of
locality enhancement: GEMM and ATLAS
Intel VTune (I) profiler for performance
analysis
Shared-memory programming
Work and span
Shared-memory architectures:
cache-coherence
pThreads programs (3 lectures)
Video
Memory
consistency
OpenMP
Parallel-prefix
Vectorization
Vectorization
(2 lectures)
MPI (3 lectures)
GPUs
GPU
architecture and CUDA