CS 377P: Programming for Performance
Assignment 1: Performance counters
Due date: Feb 6, 2026, 10:00PM
Late submission policy: Submissions can be at most 1 day
late. There will be a 10% penalty for late submissions.
Description
1) Write C code for the 6 variants of matrix-matrix multiply
(MMM) you can generate by permuting loops in the standard
three-nested loop version of MMM. The data type in the matrix
should be
complex double. You will need to include the
header file
<complex.h>.
Hint: To check cache sizes on the
machine, run: lscpu
2) Answer the following questions, using a few sentences for each
one.
- What is Moore's Law? What is Amdahl's Law? Which of these is
an empirical observation and which of these is a mathematical
fact?
- In the earliest ISA's, memory could only be accessed using
the absolute addressing mode. What problems arise in
implementing loops with such an ISA? How we get around these
problems in today's ISA's?
- What are data and control dependences? Give simple
examples to illustrate these concepts.
- Explain out-of-order execution and in-order
retirement/commit. Why do high-performance processors
execute instructions out of order but retire them in order?
What hardware structure(s) are used to implement in-order
retirement?
- Consider the invariants for retirement in OOO execution with
renaming shown in the lecture slides. Why do we need to check
the condition "(R3.PR# = ROB[n].PR#") before updating
R3.v ? Explain what would go wrong if we did not check this
condition before updating R3.v.
- What is the typical branch prediction accuracy in modern
processors? In lecture, we said "A correctly predicted branch
is essentially a NO-OP as far as performance is concerned."
Explain this statement in a few sentences.
Deliverables
Submit (in canvas) the following two files:
- A .tar.gz file with your code, a README.txt and a
Makefile.
- The README.txt describes how to run your program and what
the output will be. A reasonable output will be pairs of
"name of measured event, value".
- With the Makefile, your code should be compiled on the 5
CS machines by running only "make".
- A report (in .pdf) containing the tables, and the answers to
the questions in both parts.
Grading
Code: 40 points
Measurements (plots): 30 points
Explanation: 10 points
Answers to short questions in (2): 20 points