## Unit5.5.2Blocked LU factorization

Recall from Unit 3.3.4 that casting computation in terms of matrix-matrix multiplication facilitates high performance. In this unit we very briefly illustrate how the right-looking LU factorization can be reformulated as such a "blocked" algorithm. For details on other blocked LU factorization algorithms and blocked Cholesky factorization algorithms, we once again refer the interested reader to our Massive Open Online Course titled "LAFF-On Programming for Correctness" . We will revisit these kinds of issues in the final week of this course.

Consider $A = L U$ and partition these matrices as

\begin{equation*} A \rightarrow \FlaTwoByTwo{ A_{11} }{ A_{12} }{ A_{21} }{ A_{22} }, L \rightarrow \FlaTwoByTwo{ L_{11} }{ 0 }{ L_{21} }{ L_{22} }, U \rightarrow \FlaTwoByTwo{ U_{11} }{ U_{12} }{ 0 }{ U_{22} }, \end{equation*}

where $A_{11}\text{,}$ $L_{11} \text{,}$ and $U_{11}$ are $b \times b$ submatrices. Then

\begin{equation*} \FlaTwoByTwo{ A_{11} }{ A_{12} }{ A_{21} }{ A_{22} } = \FlaTwoByTwo{ L_{11} }{ 0 }{ L_{21} }{ L_{22} } \FlaTwoByTwo{ U_{11} }{ U_{12} }{ 0 }{ U_{22} } = \FlaTwoByTwo{ L_{11} U_{11} }{ L_{11} A_{12} }{ A_{21} U_{11} }{ A_{22} - L_{21} U_{12} }. \end{equation*}

From this we conclude that

\begin{equation*} \begin{array}{ c | c } A_{11} = L_{11} U_{11} \amp A_{12} = L_{11} U_{12} \\ \hline A_{21} = L_{21} U_{11} \amp A_{22} - L_{21} U_{12} = L_{22} U_{22}. \end{array} \end{equation*}

This suggests the following steps:

• Compute the LU factorization of $A_{11}$ (e.g., using any of the "unblocked' algorithms from Unit 5.5.1).

\begin{equation*} A_{11} = L_{11} U_{11}, \end{equation*}

overwriting $A_{11}$ with the factors.

• Solve

\begin{equation*} L_{11} U_{12} = A_{12} \end{equation*}

for $U_{12} \text{,}$ overwriting $A_{12}$ with the result. This is known as a "triangular solve with multple right-hand sides." This comes from the fact that solving

\begin{equation*} L X = B, \end{equation*}

where $L$ is lower triangular, can be reformulated by partitioning $X$ and $B$ by columns,

\begin{equation*} \begin{array}[t]{c} \underbrace{ L \left( \begin{array}{c | c | c} x_0 \amp x_1 \amp \cdots \end{array} \right) } \\ \left( \begin{array}{c | c | c} L x_0 \amp L x_1 \amp \cdots \end{array} \right) \end{array} = \left( \begin{array}{c | c | c} b_0 \amp b_1 \amp \cdots \end{array} \right) , \end{equation*}

which exposes that for each pair of columns we must solve the unit lower triangular system $L x_j = b_j \text{.}$

• Solve

\begin{equation*} L_{21} U_{11} = A_{21} \end{equation*}

for $L_{21} \text{,}$ overwriting $A_{21}$ with the result. This is also a "triangular solve with multple right-hand sides" since we can instead view it as solving the lower triangular system with multiple right-hand sides

\begin{equation*} U_{11}^T L_{21}^T = A_{21}^T. \end{equation*}

(In practice, the matrices are not transposed.)

• Update

\begin{equation*} A_{22} := A_{22} - L_{21} U_{12}. \end{equation*}
• Proceed by computing the LU factorization of the updated $A_{22} \text{.}$

This motivates the algorithm in Figure 5.5.2.1.

The important observation is that if $A$ is $m \times m$ and $b$ is much smaller than $m \text{,}$ then most of the computation is in the matrix-matrix multiplication $A_{22} := A_{22} - A_{21} A_{12} \text{.}$

###### Remark5.5.2.2.

For each (unblocked) algorithm in Unit 5.5.1, there is a corresponding blocked algorithm.