ALAFF Blocked LU factorization

Subsection 5.5.2 Blocked LU factorization

Recall from Subsection 3.3.4 that casting computation in terms of matrix-matrix multiplication facilitates high performance. In this unit we very briefly illustrate how the right-looking LU factorization can be reformulated as such a "blocked" algorithm. For details on other blocked LU factorization algorithms and blocked Cholesky factorization algorithms, we once again refer the interested reader to our Massive Open Online Course titled "LAFF-On Programming for Correctness" [29]. We will revisit these kinds of issues in the final week of this course.

Consider \(A = L U \) and partition these matrices as

\begin{equation*} A \rightarrow \FlaTwoByTwo{ A_{11} }{ A_{12} }{ A_{21} }{ A_{22} }, L \rightarrow \FlaTwoByTwo{ L_{11} }{ 0 }{ L_{21} }{ L_{22} }, U \rightarrow \FlaTwoByTwo{ U_{11} }{ U_{12} }{ 0 }{ U_{22} }, \end{equation*}

where \(A_{11}\text{,}\) \(L_{11} \text{,}\) and \(U_{11} \) are \(b \times b \) submatrices. Then

\begin{equation*} \FlaTwoByTwo{ A_{11} }{ A_{12} }{ A_{21} }{ A_{22} } = \FlaTwoByTwo{ L_{11} }{ 0 }{ L_{21} }{ L_{22} } \FlaTwoByTwo{ U_{11} }{ U_{12} }{ 0 }{ U_{22} } = \FlaTwoByTwo{ L_{11} U_{11} }{ L_{11} A_{12} }{ A_{21} U_{11} }{ A_{22} - L_{21} U_{12} }. \end{equation*}

From this we conclude that

\begin{equation*} \begin{array}{ c | c } A_{11} = L_{11} U_{11} \amp A_{12} = L_{11} U_{12} \\ \hline A_{21} = L_{21} U_{11} \amp A_{22} - L_{21} U_{12} = L_{22} U_{22}. \end{array} \end{equation*}

This suggests the following steps:

Compute the LU factorization of \(A_{11} \) (e.g., using any of the "unblocked' algorithms from Subsection 5.5.1).

\begin{equation*} A_{11} = L_{11} U_{11}, \end{equation*}

overwriting \(A_{11} \) with the factors.
Solve

\begin{equation*} L_{11} U_{12} = A_{12} \end{equation*}

for \(U_{12} \text{,}\) overwriting \(A_{12} \) with the result. This is known as a "triangular solve with multple right-hand sides." This comes from the fact that solving

\begin{equation*} L X = B, \end{equation*}

where \(L \) is lower triangular, can be reformulated by partitioning \(X \) and \(B \) by columns,

\begin{equation*} \begin{array}[t]{c} \underbrace{ L \left( \begin{array}{c | c | c} x_0 \amp x_1 \amp \cdots \end{array} \right) } \\ \left( \begin{array}{c | c | c} L x_0 \amp L x_1 \amp \cdots \end{array} \right) \end{array} = \left( \begin{array}{c | c | c} b_0 \amp b_1 \amp \cdots \end{array} \right) , \end{equation*}

which exposes that for each pair of columns we must solve the unit lower triangular system \(L x_j = b_j \text{.}\)
Solve

\begin{equation*} L_{21} U_{11} = A_{21} \end{equation*}

for \(L_{21} \text{,}\) overwriting \(A_{21} \) with the result. This is also a "triangular solve with multple right-hand sides" since we can instead view it as solving the lower triangular system with multiple right-hand sides

\begin{equation*} U_{11}^T L_{21}^T = A_{21}^T. \end{equation*}

(In practice, the matrices are not transposed.)
Update

\begin{equation*} A_{22} := A_{22} - L_{21} U_{12}. \end{equation*}
Proceed by computing the LU factorization of the updated \(A_{22} \text{.}\)

This motivates the algorithm in Figure 5.5.2.1.

\begin{equation*} \newcommand{\routinename}{A = \mbox{LU-blk-var5}( A )} \newcommand{\guard}{ n( A_{TL} ) \lt n( A ) } \newcommand{\partitionings}{ A \rightarrow \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} } \newcommand{\partitionsizes}{ A_{TL} {\rm ~is~} 0 \times 0 } \newcommand{\repartitionings}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \rightarrow \FlaThreeByThreeBR{A_{00}}{A_{01}}{A_{02}} {A_{10}}{A_{11}}{A_{12}} {A_{20}}{A_{21}}{A_{22}} } \newcommand{\moveboundaries}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \leftarrow \FlaThreeByThreeTL{A_{00}}{A_{01}}{A_{02}} {A_{10}}{A_{11}}{A_{12}} {A_{20}}{A_{21}}{A_{22}} } \newcommand{\update}{ \begin{array}{ll} A_{11} := \mbox{LU}( A_{11} ) \amp L_{11} \mbox{ and } U_{11} \mbox{ overwrite } A_{11} \\ \mbox{Solve } L_{11} U_{12} = A_{12} \amp \mbox{overwriting } A_{12} \mbox{ with } U_{12} \\ \mbox{Solve } L_{21} U_{11} = A_{21} \amp \mbox{overwriting } A_{21} \mbox{ with } L_{21} \\ A_{22} := A_{22} - A_{21} A_{12} \end{array} } \FlaAlgorithm \end{equation*}

Figure 5.5.2.1. Blocked Variant 5 (classical Gaussian elimination) LU factorization algorithm.

The important observation is that if \(A \) is \(m \times m \) and \(b \) is much smaller than \(m \text{,}\) then most of the computation is in the matrix-matrix multiplication \(A_{22} := A_{22} - A_{21} A_{12} \text{.}\)

Remark 5.5.2.2.

For each (unblocked) algorithm in Subsection 5.5.1, there is a corresponding blocked algorithm.