CS 377P: Programming for Performance
Assignment 2: Performance counters
Due date: February 14, 2017
Late submission policy: Submission can be at the most 2
days late. There will be a 10% penalty for each day after the due
Write C code for the 6 variants of matrix-matrix multiply
(MMM) you can generate by permuting loops in the standard
three-nested loop version of MMM. The data type in the matrix should
Hint: To check cache sizes on the machine,
- Instrument your implementations to use PAPI to measure:
- Total cycles
- Total instructions
- Total Load Store Instructions
- Total Floating Point Instructions
- L1 data cache accesses and misses
- L2 data cache accesses and misses
- Compile your code in ICC with flags '-O3 -fp-model precise'.
- If you need full C++14 support, you may also consider other compiler such as gcc with '-O3'.
- Using stampede at TACC, collect these measurements for three
matrix sizes: 50x50, 200x200 and 2000x2000. To ensure no
interference with other processes, submit your runs to the job
scheduler - use the 'serial' queue.
- Create 4 tables in which the rows correspond to the loop-order
variant (i-j-k, j-i-k, j-k-i, k-j-i, i-k-j, k-i-j) and the
columns correspond to the matrix size, and fill in each table
with: L1 miss rate, L2 miss rate, total load and store instructions,
and number of committed floating point instructions.
Please keep the row and column order exactly as specified above!
- Answer the following questions.
- For the smallest matrix size, do the L1 and L2 miss rates
vary for the different loop-order variants? Does it vary for
the larger matrix sizes? Is there any difference in behavior
between the different problem sizes? Can you explain the
reasons for this behavior?
- Re-instrument your code by removing PAPI calls and use
clock_gettime to measure the execution times for the six
versions of MMM and the three matrix sizes specified above.
How do your measurements compare to the execution times you
obtained from using PAPI to measure the total number of
Submit (in canvas) your code, the tables, and the answers to the
questions. Please put everything in a .zip file containing
- You codes and job scripts. You jobs should be able to run on TA's account so do not use user-related path/context.
- A README showing where your codes are and how to run it on Stampede. It's recommended to put all commands in a .sh file and simply refer it.
- Answer.pdf under the root directory showing your tables, plots and answers.
- Please use LaTeX or any other electronic way you like to generate this pdf. Do not submit a scan version of your draft.
Code: 40 points
Measurements (plots): 40 points
Explanation: 20 points
Stampede on TACC:
Use the login node only for development - do not run or debug
any executable on it. Run and debug your applications using the
To debug your code, you can print some debug info to the log.
And don't forget to cancel your job before submitting next one. Otherwise you may have multiple jobs modifying the same file.
to learn how to submit jobs.
- Use "sbatch" to submit jobs.
- Use "scancel" to cancel jobs.
- Use "sinfo" and "squeue" to monitor jobs.
- "-u", "-p" and "-n" are every useful in these SLURM commands.
- Use "module reset" to restore your module list to system default.
- "man" always provides your with useful information.
- If you wanna develop at your own laptop, use git or rsync to sync with Stampede.
- Currently the default gcc module is 4.9.1, but you can also do "module load gcc/4.9.3"
- No recent version of llvm/clang is available. You can build your own if you really want to use C++1z features.
But do not do it on login node!
- For those people who likes Mathematica, "module load mathematica" is available on login node.
If you need X for rendering your output, there's a 'vis' queue.
I don't know why this command won't work on some nodes, but as long as you load it on login node, it'll go with your job.
To program with PAPI on stampede, run:
module load papi
For help on using the module, run:
module help papi
For more information on using modules, check https://portal.tacc.utexas.edu/user-guides/stampede#compenv-modules
To see which papi counters are available on a host, run:
Read the PAPI manual http://icl.cs.utk.edu/projects/papi/wiki/PAPIC:EventSets
for more information, including example code.
"Warning! num_cntrs is more than num_mpx_cntrs" can be