CS 377P: Programming for Performance

Assignment 1: Performance counters

Due date: February 10, 2020, 10:00PM

Late submission policy: Submissions can be at most 1 day late. There will be a 10% penalty for late submissions.

Description

1) Write C code for the 6 variants of matrix-matrix multiply (MMM)  you can generate by permuting loops in the standard three-nested loop version of MMM. The data type in the matrix should be doubles.
    Hint: To check cache sizes on the machine, run:  lscpu

2) Answer the following questions, using a few sentences for each one.

Deliverables

Submit (in canvas) the following two files:

Grading

  • Code: 40 points
  • Measurements (plots): 30 points
  • Explanation: 10 points
  • Answers to short questions in (2): 20 points

  • Please note that we will check your source code. Your code should at least include matrix multiplication, time measurement, PAPI measurement, cache clean-up and CPUID (for serializing instructions).
    You can reference wiki and here for more information about CPUID.

    PAPI:

    To see which papi counters are available on a host, run:

    papi_avail

    To see which papi counters can be collected at the same time, run:

    papi_event_chooser

    Read the PAPI manual http://icl.cs.utk.edu/projects/papi/wiki/PAPIC:EventSets and http://icl.cs.utk.edu/papi/docs/index.html for more information, including example code.

    "Warning! num_cntrs is more than num_mpx_cntrs" can be ignored.

    ICC:

    To run ICC on the indicated CS machines, run:

    export PATH=$PATH:/opt/intel/bin
    icc [compiler commands]
    To check the availability of icc, run:
    icc -v