Skip to main content

Subsection 12.4.3 Deriving blocked algorithms - We've got a MOOC for that too!

Those who delve deeper into how to achieve high performance for matrix-matrix multiplication find out that it is specifically a rank-k update, the case of \(C := \alpha A B +\beta C \) where the \(k\) (inner) size is small, that achieves high performance. The blocked LU factorization that we discussed in Subsection 5.5.2 takes advantage of this by casting most of its computation in the matrix-matrix multiplication \(A_{22} := A_{22} - A_{21} A_{12} \text{.}\) A question becomes: how do I find blocked algorithms that cast most computation in terms of a rank-k updates?

The FLAME notation that we use in this course has made it possible for us to develop a systematic methodology for discovering (high-performance) algorithms. This was published in

  • [4] Paolo Bientinesi, John A. Gunnels, Margaret E. Myers, Enrique S. Quintana-Orti, Robert A. van de Geijn, The science of deriving dense linear algebra algorithms, ACM Transactions on Mathematical Software (TOMS), 2005.

and various other publications that can be found on the FLAME project publication web site http://www.cs.utexas.edu/~flame/web/FLAMEPublications.html.

You can learn these techniques, which derive algorithms hand-in-hand with their proofs of correctness, through our Massive Open Online Course