Lecture Schedule

Background
 
                Course overview

                  
Basics of computer architecture: pipelined and OOO execution processors
                   Another useful set of slides on OOO processors
                   Lectures from the ECE architecture course
 
Measurement

                  
Measurements: timing and PAPI counters      

Compilers
 
               
x86 ISA and compilers

Sources of parallelism and locality in algorithms

                  Graph algorithms
                     Additional reading: The TAO of Parallelism in Programs, Pingali et al, PLDI 2011.
                    
                   Computational science algorithms
                  Video 

Caches                  
                  Cache architecture and memory hierarchy
                  Basics of cache organization (Lecture slides from Lin/McKinley)
                  Video

                  Locality, loop and data transformations
                  Video

                 Case study of locality enhancement: GEMM and ATLAS

  
                Intel VTune (I) profiler for performance analysis 

 
Shared-memory programming
               

                Work and span              

                
Shared-memory architectures: cache-coherence     

               
pThreads programs (3 lectures) 
                Video

               
Memory consistency                       
        
        
       
OpenMP 
      
                Parallel-prefix                                                                            
                                              
 
Vectorization
            
Vectorization (2 lectures)                                      
 

            
MPI (3 lectures)   

GPUs
            
GPU architecture and CUDA