## Section2.2Blocked Matrix-Matrix Multiplication

### Unit2.2.1Basic Idea

Key to understanding how we are going to optimize matrix-matrix multiplication is to understand blocked matrix-matrix multiplication (the multiplication of matrices that have been partitioned into submatrices).

Consider $C := A B + C \text{.}$ In terms of the elements of the matrices, this means that each $\gamma_{i,j}$ is computed by

\begin{equation*} \gamma_{i,j} := \sum_{p=0}^{k-1} \alpha_{i,p} \beta_{p,j} + \gamma_{i,j} . \end{equation*}

Now, partition the matrices into submatrices:

\begin{equation*} C = \left( \begin{array}{c | c | c | c } C_{0,0} \amp C_{0,1} \amp \cdots \amp C_{0,N-1} \\ \hline C_{1,0} \amp C_{1,1} \amp \cdots \amp C_{1,N-1} \\ \hline \vdots \amp \vdots \amp \amp \vdots \\ \hline C_{M-1,0} \amp C_{M-1,1} \amp \cdots \amp C_{M-1,N-1} \end{array} \right), \end{equation*}
\begin{equation*} A = \left( \begin{array}{c | c | c | c } A_{0,0} \amp A_{0,1} \amp \cdots \amp A_{0,K-1} \\ \hline A_{1,0} \amp A_{1,1} \amp \cdots \amp A_{1,K-1} \\ \hline \vdots \amp \vdots \amp \amp \vdots \\ \hline A_{M-1,0} \amp A_{M-1,1} \amp \cdots \amp A_{M-1,K-1} \end{array} \right), \end{equation*}

and

\begin{equation*} B = \left( \begin{array}{c | c | c | c } B_{0,0} \amp B_{0,1} \amp \cdots \amp B_{0,N-1} \\ \hline B_{1,0} \amp B_{1,1} \amp \cdots \amp B_{1,N-1} \\ \hline \vdots \amp \vdots \amp \amp \vdots \\ \hline B_{K-1,0} \amp B_{K-1,1} \amp \cdots \amp B_{K-1,N-1} \end{array} \right), \end{equation*}

where

• $C_{i,j}$ is $m_i \times n_j \text{,}$

• $A_{i,p}$ is $m_i \times k_p \text{,}$ and

• $B_{p,j}$ is $k_p \times n_j$

with $\sum_{i=0}^{M-1} m_i = m \text{,}$ $\sum_{j=0}^{N-1} n_i = m \text{,}$ and $\sum_{p=0}^{K-1} k_i = m \text{.}$ Then

\begin{equation*} C_{i,j} := \sum_{p=0}^{K-1} A_{i,p} B_{p,j} + C_{i,j} . \end{equation*}
###### Remark2.2.2

In other words, provided the partitioning of the matrices is "conformal," matrix-matrix multiplication with partitioned matrices proceeds exactly as does matrix-matrix multiplication with the elements of the matrices except that the individual multiplications with the submatrices do not necessarily commute.

We illustrate how blocking $C$ into $m_b \times n_b$ submatrices, $A$ into $m_b \times n_b$ submatrices, and $B$ into $k_b \times n_b$ submatrices in Figure 2.2.3.

An implementation of one such algorithm is given in Figure 2.2.4.

### Unit2.2.2Haven't we seen this before?

We already employed various cases of blocked matrix-matrix multiplication in Week 1.

In Section 1.3 we already used the blocking of matrices to cast matrix-matrix multiplications in terms of dot products and axpy operations. This is captured in

For each of partitionings in the right column, indicate how $m_b \text{,}$ $n_b \text{,}$ and $k_b$ are chosen.

In Section 1.4, we already used the blocking of matrices to cast matrix-matrix multiplications in terms of matrix-vector multiplication and rank-1 updates. This is captured in

For each of partitionings in the right column, indicate how $m_b \text{,}$ $n_b \text{,}$ and $k_b$ are chosen.