CS 377P: Programming for Performance
Assignment 1: Performance counters
Due date: February 10, 2021, 10:00PM
Late submission policy: Submissions can be at most 1 day
late. There will be a 10% penalty for late submissions.
Description
1) Write C code for the 6 variants of matrix-matrix multiply
(MMM) you can generate by permuting loops in the standard
three-nested loop version of MMM. The data type in the matrix
should be doubles.
Hint: To check cache sizes on the
machine, run: lscpu
2) Answer the following questions, using a few sentences for each
one.
- What are data and control dependences? Give simple
examples to illustrate these concepts.
- Explain out-of-order execution and in-order
retirement/commit. Why do high-performance processors
execute instructions out of order but retire them in order?
What hardware structure(s) are used to implement in-order
retirement?
Deliverables
Submit (in canvas) the following two files:
- A .tar.gz file with your code, a README.txt and a
Makefile.
- The README.txt describes how to run your program and what
the output will be. A reasonable output will be pairs of
"name of measured event, value".
- A Makefile that produces an executable named
mmm by running only "make" on any of the 8 CS machines.
- The executable mmm, when no flag is given,
should open a file named matrix.txt and output the matrix
multiplication result to res.txt. The idea of fixed filename
is C++ string parsing is not fun for some people and not the focus of the class.
- You can have other flags to specify which loop-order variant to run,
size of matrix or input/output filename for easier experiment. However, that's
completely open to you, as long as it doesn't contradict with the previous requirement.
- Here's sample matrix.txt and res.txt.
The first line of matrix.txt contains two number, number of rows and number of columns of
the first matrix, followed by the first matrix with each row separated by a newline and
each element separated by a space. Then another line follows containing dimensions of
the second matrix, followed by its contents.
- Matrix files should NOT be included in the submission, as that can be very big.
You can just generate matrix in memory, without reading it from files, when running experiments.
- The tar file structure should look like this, ./{Makefile,README.txt,[source code files]} ,
not ./ProjectDirectoryName/{Makefile,README.txt,[source code files]} .
- A report (in .pdf) containing the tables, and the answers to
the questions in both parts.
Grading
Code: 40 points
Measurements (plots): 30 points
Explanation: 10 points
Answers to short questions in (2): 20 points
PAPI:
To see which papi counters are available on a host, run:
papi_avail
To see which papi counters can be collected at the same time,
run:
papi_event_chooser
Read the PAPI manual http://icl.cs.utk.edu/projects/papi/wiki/PAPIC:EventSets
and http://icl.cs.utk.edu/papi/docs/index.html
for more information, including example code.
"Warning! num_cntrs is more than num_mpx_cntrs" can be ignored.
ICC:
To run ICC on the indicated CS machines, run:
export PATH=$PATH:/opt/intel/bin
icc [compiler commands]
To check the availability of icc, run:
icc -v