CS 377P: Programming for Performance

Assignment 2: Performance counters

Due date: February 14, 2017

Late submission policy: Submission can be at the most 2 days late. There will be a 10% penalty for each day after the due date (cumulative).


Write C code for the 6 variants of matrix-matrix multiply (MMM)  you can generate by permuting loops in the standard three-nested loop version of MMM. The data type in the matrix should be doubles.     Hint: To check cache sizes on the machine, run:      lscpu


Submit (in canvas) your code, the tables, and the answers to the questions. Please put everything in a .zip file containing the following:


  • Code: 40 points
  • Measurements (plots): 40 points
  • Explanation: 20 points

    Stampede on TACC:

    Use the login node only for development - do not run or debug any executable on it. Run and debug your applications using the job scheduler.

    To debug your code, you can print some debug info to the log.

    And don't forget to cancel your job before submitting next one. Otherwise you may have multiple jobs modifying the same file.

    Read http://www.tacc.utexas.edu/user-services/user-guides/stampede-user-guide#running to learn how to submit jobs.



    To program with PAPI on stampede, run:

    module load papi

    For help on using the module, run:

    module help papi

    For more information on using modules, check https://portal.tacc.utexas.edu/user-guides/stampede#compenv-modules

    To see which papi counters are available on a host, run:


    Read the PAPI manual http://icl.cs.utk.edu/projects/papi/wiki/PAPIC:EventSets for more information, including example code.

    "Warning! num_cntrs is more than num_mpx_cntrs" can be ignored.