CS 378: Programming for Performance

Assignment 1: Performance counters

Due date: February 7

Late submission policy: Submission can be at the most 2 days late. There will be a 10% penalty for each day after the due date (cumulative).


Write C code for the 6 variants of matrix-matrix multiply (matrixC = matrixA * matrixB) you can generate by permuting loops. The data type in the matrix should be doubles.


Submit (in canvas) your code and a PDF containing the plots and explanation.


  • Code: 40 points
  • Measurements (plots): 40 points
  • Explanation: 20 points

    Stampede on TACC:

    Use the login node only for development - do not run or debug any executable on it. Run and debug your applications using the job scheduler.

    Read http://www.tacc.utexas.edu/user-services/user-guides/stampede-user-guide#running to learn how to submit jobs.


    To program with PAPI on stampede, run:

    module load papi

    For help on using the module, run:

    module help papi

    For more information on using modules, check https://portal.tacc.utexas.edu/user-guides/stampede#compenv-modules

    To see which papi counters are available on a host, run:


    Read the PAPI manual http://icl.cs.utk.edu/projects/papi/wiki/PAPIC:EventSets for more information, including example code.

    "Warning! num_cntrs is more than num_mpx_cntrs" can be ignored.