Skip to main content

Unit 1.4.1 Rank-1 update (rank-1)

An operation that will become very important in future discussion and optimization of matrix-matrix multiplication is the rank-1 update:

\begin{equation*} A := x y^T + A . \end{equation*}

More generally,

\begin{equation*} A := x y^T + A \end{equation*}

is computed as

\begin{equation*} \begin{array}{l} \left(\begin{array}{c | c | c | c} \alpha_{0,0} \amp \alpha_{0,1} \amp \cdots \amp \alpha_{0,k-1} \\ \hline \alpha_{1,0} \amp \alpha_{1,1} \amp \cdots \amp \alpha_{1,k-1} \\ \hline \vdots \amp \vdots \amp \amp \vdots \\ \hline \alpha_{m-1,0} \amp \alpha_{m-1,1} \amp \cdots \amp \alpha_{m-1,k-1} \end{array} \right):= \\ \left( \begin{array}{c} \chi_0 \\ \hline \chi_1 \\ \hline \vdots \\ \hline \chi_{m-1} \end{array} \right) \left( \begin{array}{c | c | c | c} \psi_0 \amp \psi_1 \amp \cdots \amp \psi_{n-1} \end{array} \right) + \left(\begin{array}{c | c | c | c} \alpha_{0,0} \amp \alpha_{0,1} \amp \cdots \amp \alpha_{0,n-1} \\ \hline \alpha_{1,0} \amp \alpha_{1,1} \amp \cdots \amp \alpha_{1,n-1} \\ \hline \vdots \amp \vdots \amp \amp \vdots \\ \hline \alpha_{m-1,0} \amp \alpha_{m-1,1} \amp \cdots \amp \alpha_{m-1,n-1} \end{array} \right) \\ = \left(\begin{array}{c | c | c | c} \chi_0 \psi_0 + \alpha_{0,0} \amp \chi_0 \psi_1 + \alpha_{0,1} \amp \cdots \amp \chi_0 \psi_{n-1} + \alpha_{0,n-1} \\ \hline \chi_1 \psi_0 + \alpha_{1,0} \amp \chi_1 \psi_1 + \alpha_{1,1} \amp \cdots \amp \chi_1 \psi_{n-1} + \alpha_{1,n-1} \\ \hline \vdots \amp \vdots \amp \amp \vdots \\ \hline \chi_{m-1} \psi_0 + \alpha_{m-1,0} \amp \chi_{m-1} \psi_1 + \alpha_{m-1,1} \amp \cdots \amp \chi_{m-1} \psi_{n-1} + \alpha_{m-1,n-1} \end{array} \right). \end{array} \end{equation*}

so that each entry \(\alpha_{i,j} \) is updated by

\begin{equation*} \alpha_{i,j} := \chi_i \psi_j + \alpha_{i,j}. \end{equation*}

If we now focus on the columns in this last matrix, we find that

\begin{equation*} \begin{array}{l} \left(\begin{array}{c | c | c | c} \chi_0 \psi_0 + \alpha_{0,0} \amp \chi_0 \psi_1 + \alpha_{0,1} \amp \cdots \amp \chi_0 \psi_{n-1} + \alpha_{0,n-1} \\ \hline \chi_1 \psi_0 + \alpha_{1,0} \amp \chi_1 \psi_1 + \alpha_{1,1} \amp \cdots \amp \chi_1 \psi_{n-1} + \alpha_{1,n-1} \\ \hline \vdots \amp \vdots \amp \amp \vdots \\ \hline \chi_{m-1} \psi_0 + \alpha_{m-1,0} \amp \chi_{m-1} \psi_1 + \alpha_{m-1,1} \amp \cdots \amp \chi_{m-1} \psi_{n-1} + \alpha_{m-1,n-1} \end{array} \right) \\ = \left(\begin{array}{c | c | c | c} \left(\begin{array}{c} \chi_0 \\ \chi_1 \\ \vdots \\ \chi_{m-1} \end{array}\right) \psi_0 + \left(\begin{array}{c} \alpha_{0,0} \\ \alpha_{1,0} \\ \vdots \\ \alpha_{m-1,0} \end{array}\right) \amp \cdots \amp \left(\begin{array}{c} \chi_0 \\ \chi_1 \\ \vdots \\ \chi_{m-1} \end{array}\right) \psi_{n-1} + \left(\begin{array}{c} \alpha_{0,n-1} \\ \alpha_{1,n-1} \\ \vdots \\ \alpha_{m-1,n-1} \end{array}\right) \end{array} \right) \\ = \left(\begin{array}{c | c | c | c} \psi_0 \left(\begin{array}{c} \chi_0 \\ \chi_1 \\ \vdots \\ \chi_{m-1} \end{array}\right) + \left(\begin{array}{c} \alpha_{0,0} \\ \alpha_{1,0} \\ \vdots \\ \alpha_{m-1,0} \end{array}\right) \amp \cdots \amp \psi_{n-1} \left(\begin{array}{c} \chi_0 \\ \chi_1 \\ \vdots \\ \chi_{m-1} \end{array}\right) + \left(\begin{array}{c} \alpha_{0,n-1} \\ \alpha_{1,n-1} \\ \vdots \\ \alpha_{m-1,n-1} \end{array}\right) \end{array} \right) . \end{array} \end{equation*}

What this illustrates is that we could have partitioned \(A\) by columns and \(y \) by elements to find that

\begin{equation*} \begin{array}{l} \left(\begin{array}{c | c | c | c} a_0 \amp a_1 \amp \cdots \amp a_{n-1} \end{array} \right) \\ ~~~:= x \left( \begin{array}{c | c | c | c} \psi_0 \amp \psi_1 \amp \cdots \amp \psi_{n-1} \end{array} \right) + \left(\begin{array}{c | c | c | c} a_0 \amp a_1 \amp \cdots \amp a_{n-1} \end{array} \right) \\ ~~~= \left(\begin{array}{c | c | c | c} x \psi_0 + a_0 \amp x \psi_1 + a_1 \amp \cdots \amp x \psi_{n-1} + a_{n-1} \end{array} \right) \\ ~~~= \left(\begin{array}{c | c | c | c} \psi_0 x + a_0 \amp \psi_1 x + a_1 \amp \cdots \amp \psi_{n-1} x + a_{n-1} \end{array} \right). \end{array} \end{equation*}

This discussion explains the JI loop ordering for computing \(A := x y^T + A \text{:}\)

\begin{equation*} \begin{array}{l} {\bf for~} j := 0, \ldots , n-1 \\[0.05in] ~~~ \left. \begin{array}{l} {\bf for~} i := 0, \ldots , m-1 \\ ~~~ ~~~ \alpha_{i,j} := \chi_{i} \psi_{j} + \alpha_{i,j} \\ {\bf end} \end{array} \right\} ~~~ a_j := \psi_j x + a_j \\[0.1in] {\bf end} \end{array} \end{equation*}

It demonstrates is how the rank-1 operation can be implemented as a sequence of axpy operations.

The last homework suggests that there is also an IJ loop ordering that can be explained by partitioning \(A \) by rows and \(x \) by elements:

\begin{equation*} \left( \begin{array}{c} \widetilde a_0^T \\ \hline \widetilde a_1^T \\ \hline \vdots \\ \hline \widetilde a_{m-1}^T \end{array} \right) := \left( \begin{array}{c} \chi_0 \\ \hline \chi_1 \\ \hline \vdots \\ \hline \chi_{m-1} \end{array} \right) y^T + \left( \begin{array}{c} \widetilde a_0^T \\ \hline \widetilde a_1^T \\ \hline \vdots \\ \hline \widetilde a_{m-1}^T \end{array} \right) = \left( \begin{array}{c} \chi_0 y^T + \widetilde a_0^T \\ \hline \chi_1 y^T + \widetilde a_1^T \\ \hline \vdots \\ \hline \chi_{m-1} y^T + \widetilde a_{m-1}^T \end{array} \right) \end{equation*}

leading to the algorithm

\begin{equation*} \begin{array}{l} {\bf for~} i := 0, \ldots , n-1 \\[0.05in] ~~~ \left. \begin{array}{l} {\bf for~} j := 0, \ldots , m-1 \\ ~~~ ~~~ \alpha_{i,j} := \chi_{i} \psi_{j} + \alpha_{i,j} \\ {\bf end} \end{array} \right\} ~~~ \widetilde a_i^T := \chi_i y^T + \widetilde a_i^T \\[0.1in] {\bf end} \end{array} \end{equation*}

and corresponding implementation.