## Preface Preface

ΒΆOver the years, we have noticed that many ideas that underlie programming for high performance can be illustrated through a very simple example: the computation of a matrix-matrix multiplication. It is perhaps for this reason that papers that expose the intricate details of how to optimize matrix-matrix multiplication are often used in the classroom [1] [4]. In this course, we have tried to carefully scaffold the techniques that lead to a high performance matrix-matrix multiplication implementation with the aim to make the materials of use to a broad audience.

It is our experience that some really get into this material. They dive in to tinker under the hood of the matrix-matrix multiplication implementation much like some tinker under the hood of a muscle car. Others merely become aware that they should write their applications in terms of libraries that provide high performance implementations of the functionality they need. We believe that learners who belong to both extremes, or in between, benefit from the knowledge this course shares.

Robert van de Geijn

Maggie Myers

Devangi Parikh

Austin, 2019