## Subsection5.6.1Other LU factorization algorithms

There are actually five different (unblocked) algorithms for computing the LU factorization that were discovered over the course of the centuries. Here we show how to systematically derive all five.

Finding the algorithms starts with the following observations.

• Our algorithms will overwrite the matrix $A \text{,}$ and hence we introduce $\widehat A$ to denote the original contents of $A \text{.}$ We will say that the precondition for the algorithm is that

\begin{equation*} A = \widehat A \end{equation*}
• ($A$ starts by containing the original contents of $A \text{.}$)

• We wish to overwrite $A$ with $L$ and $U \text{.}$ Thus, the postcondition for the algorithm (the state in which we wish to exit the algorithm) is that

\begin{equation*} A = L \backslash U \wedge L U = \widehat A \end{equation*}

($A$ is overwritten by $L$ below the diagonal and $U$ on and above the diagonal, where multiplying $L$ and $U$ yields the original matrix $A\text{.}$)

• All the algorithms will march through the matrices from top-left to bottom-right, giving us the code skeleton in Figure 5.30. Since the computed $L$ and $U$ overwrite $A \text{,}$ throughout they are partitioned comformal to (in the same way) as is $A \text{.}$

• Thus, before and after each iteration of the loop the matrices are viewed as quadrants:

\begin{equation*} A \rightarrow \FlaTwoByTwo { A_{TL}}{ A_{TR}} { A_{BL}}{ A_{BR}} , L \rightarrow \FlaTwoByTwo { L_{TL}}{ 0 } { L_{BL}}{ L_{BR}} , \mbox{and} \quad U \rightarrow \FlaTwoByTwo { U_{TL}}{ U_{TR} } { 0}{ U_{BR}}. \end{equation*}

where $A_{TL} \text{,}$ $L_{TL} \text{,}$ and $U_{TL}$ are all square and equally sized.

• In terms of these exposed quadrants, in the end we wish for matrix $A$ to contain

\begin{equation*} \begin{array}{l} \FlaTwoByTwo { A_{TL} }{ A_{TR} } { A_{BL} }{ A_{BR} } = \FlaTwoByTwo { L \backslash U_{TL} }{ U_{TR} } { L_{BL} }{ L\backslash U_{BR} } \\ ~~~~~\wedge \FlaTwoByTwo { L_{TL} }{0} { L_{BL} }{ L_{BR} } \FlaTwoByTwo { U_{TL} }{U_{TR}} { 0 }{ U_{BR} } = \FlaTwoByTwo { \widehat A_{TL} }{ \widehat A_{TR} } { \widehat A_{BL} }{ \widehat A_{BR} } \end{array} \end{equation*}
• Manipulating this yields what we call the Partitioned Matrix Expression (PME), which can be viewed as a recursive definition of the LU factorization:

\begin{equation*} \begin{array}{l} \FlaTwoByTwo { A_{TL} }{ A_{TR} } { A_{BL} }{ A_{BR} } = \FlaTwoByTwo { L \backslash U_{TL} }{ U_{TR} } { L_{BL} }{ L \backslash U_{BR} } \\ ~~~~~\wedge \begin{array}{c | c} { L_{TL} U_{TL} = \widehat A_{TL} } \amp { L_{TL} U_{TR} = \widehat A_{TR} } \\ \hline { L_{BL} U_{TL} = \widehat A_{BL} } \amp { L_{BR} U_{BR} = \widehat A_{BR} - L_{BL} U_{TR} }. \end{array} \end{array} \end{equation*}
• Now, consider the code skeleton for the LU factorization in Figure 5.30. At the top of the loop (right after the $\color{blue} {\bf while}$), we want to maintain certain contents in matrix $A \text{.}$ Since we are in a loop, we haven't yet overwritten $A$ with the final result. Instead, some progress toward this final result has been made. The way we can find what the state of $A$ is that we would like to maintain is to take the PME and delete subexpression. For example, consider the following condition on the contents of $A \text{:}$

\begin{equation*} \begin{array}{l} \FlaTwoByTwo { A_{TL} }{ A_{TR} } { A_{BL} }{ A_{BR} } = \FlaTwoByTwo { L \backslash U_{TL} }{ U_{TR} } { L_{BL} }{ \widehat A_{BR} - L_{BL} U_{TR} } \\ ~~~~~\wedge \begin{array}{c | c} { L_{TL} U_{TL} = \widehat A_{TL} } \amp { L_{TL} U_{TR} = \widehat A_{TR} } \\ \hline { L_{BL} U_{TL} = \widehat A_{BL} } \amp { \phantom{L_{BR} U_{BR} = \widehat A_{BR} - L_{BL} U_{TR}} }. \end{array} \end{array} \end{equation*}

What we are saying is that $A_{TL} \text{,}$ $A_{TR} \text{,}$ and $A_{BL}$ have been completely updated with the corresponding parts of $L$ and $U\text{,}$ and $A_{BR}$ has been partially updated. This is exactly the state that the right-looking algorithm that we discussed in Subsection 5.2.4 maintains! What is left is to factor $A_{BR} \text{,}$ since it contains $\widehat A_{BR} - L_{BL} U_{TR} \text{,}$ and $\widehat A_{BR} - L_{BL} U_{TR} = L_{BR} U_{BR} \text{.}$

• By carefully analyzing the order in which computation must occur (in compiler lingo: by performing a dependence analysis), we can identify five states that can be maintained at the top of the loop, by deleting subexpressions from the PME. These are called loop invariants. There are five for LU factorization:

\begin{equation*} \small \begin{array}{c} \begin{array}{c | c} \FlaTwoByTwo { A_{TL} }{ A_{TR} } { A_{BL} }{ A_{BR} } = \FlaTwoByTwo { L \backslash U_{TL} }{ \widehat A_{TR} } { \widehat A_{BL} }{ \widehat A_{BR} } \amp \FlaTwoByTwo { A_{TL} }{ A_{TR} } { A_{BL} }{ A_{BR} } = \FlaTwoByTwo { L \backslash U_{TL} }{ U_{TR} } { \widehat A_{BL} }{ \widehat A_{BR} } \\ \mbox{Invariant 1} \amp \mbox{Invariant 2} \\ \hline \FlaTwoByTwo { A_{TL} }{ A_{TR} } { A_{BL} }{ A_{BR} } = \FlaTwoByTwo { L \backslash U_{TL} }{ \widehat A_{TR} } { \widehat L_{BL} }{ \widehat A_{BR} } \amp \FlaTwoByTwo { A_{TL} }{ A_{TR} } { A_{BL} }{ A_{BR} } = \FlaTwoByTwo { L \backslash U_{TL} }{ U_{TR} } { L_{BL} }{ \widehat A_{BR} } \\ \mbox{Invariant 3} \amp \mbox{Invariant 4} \\ \hline \end{array} \\ \FlaTwoByTwo { A_{TL} }{ A_{TR} } { A_{BL} }{ A_{BR} } = \FlaTwoByTwo { L \backslash U_{TL} }{ U_{TR} } { L_{BL} }{ \widehat A_{BR}-L_{BL} U_{TR} } \\ \mbox{Invariant 5} \end{array} \end{equation*}
• Key to figuring out what updates must occur in the loop for each of the variants is to look at how the matrices are repartitioned at the top and bottom of the loop body.

For each of the five algorithms for LU factorization, we will derive the loop invariant, and then derive the algorithm from the loop invariant.

### Subsubsection5.6.1.1Variant 1: Bordered algorithm

Consider the loop invariant:

\begin{equation*} \FlaTwoByTwo {A_{TL} }{ A_{TR} } {A_{BL}}{A_{BR}} = \FlaTwoByTwo {\color{red}{ L \backslash U_{TL}} }{ \widehat A_{TR} } { \widehat A_{BL}}{ \widehat A_{BR}} \wedge L_{TL} U_{TL} = \widehat A_{TL}, \end{equation*}

meaning that the leading principle submatrix $A_{TL}$ has been overwritten with its LU factorization, and the remainder of the matrix has not yet been touched.

At the top of the loop, after repartitioning, $A$ then contains

\begin{equation*} \FlaThreeByThreeBR {A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} = \left( \begin{array}{c | c c} {\color{red}{L \backslash U_{00}}} \amp \widehat a_{01} \amp \widehat A_{02} \\ \hline \widehat a_{10}^T \amp \widehat \alpha_{11} \amp \widehat a_{12}^T \\ \widehat A_{20} \amp \widehat a_{21} \amp \widehat A_{22} \end{array} \right) \wedge {\color{red} {L_{00} U_{00}}} = \widehat A_{00} \end{equation*}

while after updating $A$ it must contain

\begin{equation*} \begin{array}{l} \FlaThreeByThreeTL {A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} = \left( \begin{array}{c c | c} {\color{red} {L \backslash U_{00}}} \amp {\color{blue} {u_{01}}} \amp \widehat A_{02} \\ {\color{blue} {l_{10}^T}} \amp {\color{blue} {\upsilon_{11}}} \amp \widehat a_{12}^T \\ \hline \widehat A_{20} \amp \widehat a_{21} \amp \widehat A_{22}, \end{array} \right) \\ ~~~~~ \wedge \begin{array}[t]{c} \underbrace{ \left( \begin{array}{c c} L_{00} \amp 0 \\ l_{10}^T \amp 1 \end{array} \right) \left( \begin{array}{c c} U_{00} \amp u_{01} \\ 0 \amp \upsilon_{11} \end{array} \right) = \left( \begin{array}{c c} \widehat A_{00} \amp \widehat a_{01} \\ \widehat a_{10}^T \amp \widehat \alpha_{11} \end{array} \right) }\\ \begin{array}{r r} { { \color{red} {L_{00} U_{00}} } = \widehat A_{00} } \amp { {\color{red}{ L_{00}}} {\color{blue} {u_{01}}} = \widehat a_{01} } \\ { {\color{blue} {l_{10}^T}} {\color{red} {U_{00}}} = \widehat a_{10}^T } \amp { {\color{blue} {l_{10}^T}} {\color{blue} {u_{01}}} + {\color{blue} {\upsilon_{11}}} = \widehat \alpha_{11} } \end{array} \end{array} \end{array} \end{equation*}

for the loop invariant to again hold after the iteration. Here the entries in red are known (in addition to the ones marked with a "hat") and the entries in blue are to be computed. With this, we can compute the desired parts of $L$ and $U \text{:}$

• Solve $L_{00} u_{01} = a_{01} \text{,}$ overwriting $a_{01}$ with the result. (Notice that $a_{01} = \widehat a_{01}$ before this update.)

• Solve $l_{10}^T U_{00} = a_{10}^T$ (or, equivalently, $U_{00}^T (l_{10}^T)^T = (a_{10}^T)^T$ for $l_{10}^T$), overwriting $a_{10}^T$ with the result. (Notice that $a_{10}^T = \widehat a_{10}^T$ before this update.)

• Update $\alpha_{11} := \upsilon_{11} = \alpha_{11} - l_{10}^T u_{01} \text{.}$ (Notice that by this computation, $a_{10}^T = l_{10}^T$ and $a_{01} = u_{01} \text{.}$)

The resulting algorithm is captured in Figure 5.31.

###### Homework5.6.1.

If $A$ is $n \times n \text{,}$ show the cost of Variant 1 is approximately $\frac{2}{3} n^3 \text{.}$

Solution

During the $k$th iteration, $A_{00}$ is $k \times k \text{,}$ for $k = 0, \ldots, n-1 \text{.}$ Then the (approximate) cost of the steps are given by

• Solve $L_{00} u_{01} = a_{01} \text{,}$ overwriting $a_{01}$ with the result. Cost: approximately $k^2$ flops.

• Solve $l_{10}^T U_{00} = a_{10}^T$ (or, equivalently, $U_{00}^T (l_{10}^T)^T = (a_{10}^T)^T$ for $l_{10}^T$), overwriting $a_{10}^T$ with the result. Cost: approximately $k^2$ flops.

• Compute $\upsilon_{11} = \alpha_{11} - l_{10}^T u_{01} \text{,}$ overwriting $\alpha_{11}$ with the result. Cost: $2k$ flops.

Thus, the total cost is given by

\begin{equation*} \sum_{k=0}^{n-1} \left( k^2 + k^2 + 2k \right) \approx 2 \sum_{k=0}^{n-1} k^2 \approx 2 \frac{1}{3} n^3 = \frac{2}{3} n^3. \end{equation*}

### Subsubsection5.6.1.2Variant 2: Up-looking algorithm

Consider next the loop invariant:

\begin{equation*} \FlaTwoByTwo {A_{TL} }{ A_{TR} } {A_{BL}}{A_{BR}} = \FlaTwoByTwo {\color{red}{ L \backslash U_{TL}} }{ \color{red}{U_{TR}} } { \widehat A_{BL}}{ \widehat A_{BR}} \wedge \begin{array}{l | l} L_{TL} U_{TL} = \widehat A_{TL} \amp L_{TL} U_{TR} = \widehat A_{TR} \end{array} \end{equation*}

meaning that the leading principle submatrix $A_{TL}$ has been overwritten with its LU factorization and $U_{TR}$ has overwritten $A_{TR} \text{.}$

At the top of the loop, after repartitioning, $A$ then contains

\begin{equation*} \begin{array}{l} \FlaThreeByThreeBR {A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} = \left( \begin{array}{c | c c} {\color{red}{L \backslash U_{00}}} \amp \color{red}{u_{01}} \amp \color{red}{U_{02}} \\ \hline \widehat a_{10}^T \amp \widehat \alpha_{11} \amp \widehat a_{12}^T \\ \widehat A_{20} \amp \widehat a_{21} \amp \widehat A_{22} \end{array} \right) \\ ~~~~~ \wedge \begin{array}[t]{c} \underbrace{ \begin{array}{r | r } L_{00} U_{00} = \widehat A_{00} \amp L_{00} \left( \begin{array}{r r} u_{01} \amp U_{02} \end{array} \right) \end{array} = \left( \begin{array}{r r} \widehat a_{01} \amp \widehat A_{02} \end{array} \right) } \\ \begin{array}{r | r r} {\color{red} {L_{00} U_{00}}} = \widehat A_{00} \amp {\color{red} {L_{00} u_{01}}} = \widehat a_{01} \amp {\color{red} {L_{00} U_{02}}} = \widehat A_{02} \end{array} \end{array} \end{array} \end{equation*}

while after updating $A$ it must contain

\begin{equation*} \begin{array}{l} \FlaThreeByThreeTL {A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} = \left( \begin{array}{c c | c} {\color{red} {L \backslash U_{00}}} \amp {\color{red} {u_{01}}} \amp {\color{red} {U_{02}}} \\ {\color{blue} {l_{10}^T}} \amp {\color{blue} {\upsilon_{11}}} \amp {\color{blue} {u_{12}^T}} \\ \hline \widehat A_{20} \amp \widehat a_{21} \amp \widehat A_{22}, \end{array} \right) \\ ~~~~~ \wedge \begin{array}[t]{c} \underbrace{ % \small \begin{array}{r|r} \left( \begin{array}{c c} L_{00} \amp 0 \\ l_{10}^T \amp 1 \end{array} \right) \left( \begin{array}{c c} U_{00} \amp u_{01} \\ 0 \amp \upsilon_{11} \end{array} \right) = \left( \begin{array}{c c} \widehat A_{00} \amp \widehat a_{01} \\ \widehat a_{10}^T \amp \widehat \alpha_{11} \end{array} \right) \amp \left( \begin{array}{c c} L_{00} \amp 0 \\ l_{10}^T \amp 1 \end{array} \right) \left( \begin{array}{c c} U_{02} \\ u_{12}^T \end{array} \right) = \left( \begin{array}{c c} \widehat A_{00} \\ \widehat a_{12}^T \end{array} \right) \end{array} }\\ \begin{array}{r r| r } { { \color{red} {L_{00} U_{00}} } = \widehat A_{00} } \amp { {\color{red}{ L_{00}}} {\color{red} {u_{01}}} = \widehat a_{01} } \amp { {\color{red}{ L_{00}}} {\color{red} {U_{02}}} = \widehat A_{02} } \\ { {\color{blue} {l_{10}^T}} {\color{red} {U_{00}}} = \widehat a_{10}^T } \amp { {\color{blue} {l_{10}^T}} {\color{red} {u_{01}}} + {\color{blue} {\upsilon_{11}}} = \widehat \alpha_{11} } \amp { {\color{blue} {l_{10}^T}} {\color{red} {U_{02}}} + {\color{blue} {u_{12}^T}} = \widehat a_{12}^T } \end{array} \end{array} \end{array} \end{equation*}

for the loop invariant to again hold after the iteration. Here, again, the entries in red are known (in addition to the ones marked with a "hat") and the entries in blue are to be computed. With this, we can compute the desired parts of $L$ and $U \text{:}$

• Solve $l_{10}^T U_{00} = a_{10}^T \text{,}$ overwriting $a_{10}^T$ with the result.

• Update $\alpha_{11} := \upsilon_{11} = \alpha_{11} - l_{10}^T u_{01} = \alpha_{11} - a_{10}^T a_{01} \text{.}$

• Update $a_{12}^T := u_{12}^T = a_{12}^T - l_{10}^T U_{02} = a_{12}^T - a_{10}^T A_{02} \text{.}$

The resulting algorithm is captured in Figure 5.32.

###### Homework5.6.2.

If $A$ is $n \times n \text{,}$ show the cost of Variant 2 is approximately $\frac{2}{3} n^3 \text{.}$

Solution

During the $k$th iteration, $A_{00}$ is $k \times k \text{,}$ for $k = 0, \ldots, n-1 \text{.}$ Then the (approximate) cost of the steps are given by

• Solve $l_{10}^T U_{00} = a_{10}^T \text{,}$ overwriting $a_{10}^T$ with the result. Approximate cost: $k^2$ flops.

• Update $\alpha_{11} := \upsilon_{11} = \alpha_{11} - l_{10}^T u_{01} = \alpha_{11} - a_{10}^T a_{01} \text{.}$ Approximate cost: $2k$ flops.

• Update $a_{12}^T := u_{12}^T = a_{12}^T - l_{10}^T U_{02} = a_{12}^T - a_{10}^T A_{02} \text{.}$ Approximate cost: $2 k( n-k-1)$ flops.

Thus, the total cost is approximately given by

\begin{equation*} \begin{array}{l} \sum_{k=0}^{n-1} \left( k^2 + 2k + 2 k ( n-k-1) \right) \\ ~~~=~~~~ \lt \mbox{ simplify } \gt \\ \sum_{k=0}^{n-1} \left( 2 k n -k^2 ) \right) \\ ~~~=~~~~ \lt \mbox{ algebra } \gt \\ 2 n \sum_{k=0}^{n-1} k - \sum_{k=0}^{n-1} k^2 \\ ~~~\approx~~~~ \lt \sum_{k=0}^{n-1} k \approx n^2/2; \sum_{k=0}^{n-1} k^2 \approx k^3/3 \gt \\ n^3 - \frac{n^3}{3} \\ ~~~=~~~~ \lt \mbox{ simplify } \gt \\ \frac{2}{3} n^3. \end{array} \end{equation*}

### Subsubsection5.6.1.3Variant 3: Left-looking algorithm

Consider the loop invariant:

\begin{equation*} \FlaTwoByTwo {A_{TL} }{ A_{TR} } {A_{BL}}{A_{BR}} = \FlaTwoByTwo {\color{red}{ L \backslash U_{TL}} }{ \color{black}{\widehat A_{TR}}} { \color{red}{L_{BL}}}{ \widehat A_{BR}} \wedge \begin{array}{r | r} L_{TL} U_{TL} = \widehat A_{TL} \amp \phantom{L_{TL} U_{TR} = \widehat A_{TR}} \\ \hline L_{BL} U_{TL} = \widehat A_{TL}. \end{array} \end{equation*}

At the top of the loop, after repartitioning, $A$ then contains

\begin{equation*} \begin{array}{l} \FlaThreeByThreeBR {A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} = \left( \begin{array}{c | c c} {\color{red}{L \backslash U_{00}}} \amp \color{black}{\widehat a_{01}} \amp \color{black}{\widehat A_{02}} \\ \hline {\color{red}{l_{10}^T}} \amp \widehat \alpha_{11} \amp \widehat a_{12}^T \\ {\color{red}{L_{20}}} \amp \widehat a_{21} \amp \widehat A_{22} \end{array} \right) \wedge \begin{array}{r} {\color{red} {L_{00} U_{00}}} = \widehat A_{00} \\ \hline {\color{red} {l_{10}^T U_{00}}} = \widehat a_{10}^T \\ {\color{red} {L_{20} U_{00}}} = \widehat A_{20} \end{array} \end{array} \end{equation*}

while after updating $A$ it must contain

\begin{equation*} \begin{array}{l} \FlaThreeByThreeTL {A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} = \left( \begin{array}{c c | c} {\color{red} {L \backslash U_{00}}} \amp {\color{blue} {u_{01}}} \amp {\color{black} {\widehat A_{02}}} \\ {\color{red} {l_{10}^T}} \amp {\color{blue} {\upsilon_{11}}} \amp {\color{black} {\widehat a_{12}^T}} \\ \hline {\color{red} {L_{20}}} \amp {\color{blue} {l_{21}}} \amp \widehat A_{22}, \end{array} \right) \\ ~~~~~ \wedge \begin{array}[t]{c} \underbrace{ % \small \begin{array}{r r} \left( \begin{array}{c c} L_{00} \amp 0 \\ l_{10}^T \amp 1 \end{array} \right) \left( \begin{array}{c c} U_{00} \amp u_{01} \\ 0 \amp \upsilon_{11} \end{array} \right) = \left( \begin{array}{c c} \widehat A_{00} \amp \widehat a_{01} \\ \widehat a_{10}^T \amp \widehat \alpha_{11} \end{array} \right) % \amp % \phantom{ % \left( \begin{array}{c c} % L_{00} \amp 0 \\ % l_{10}^T \amp 1 % \end{array} \right) % \left( \begin{array}{c c} % U_{02} \\ % u_{12}^T % \end{array} \right) % = % \left( \begin{array}{c c} % \widehat A_{00} \\ % \widehat a_{12}^T % \end{array} \right) } \\ \hline \left( \begin{array}{c c} L_{20} \amp l_{21} \end{array} \right) \left( \begin{array}{c c} U_{00} \amp u_{01} \\ 0 \amp \upsilon_{11} \end{array} \right) = \left( \begin{array}{c c} \widehat A_{20} \amp \widehat a_{21} \end{array} \right) \amp \end{array} }\\ \begin{array}{r r| r } { { \color{red} {L_{00} U_{00}} } = \widehat A_{00} } \amp { {\color{red}{ L_{00}}} {\color{blue} {u_{01}}} = \widehat a_{01} } % \amp % { {\color{red}{ % L_{00}}} {\color{red} {U_{02}}} = \widehat A_{02} % } \\ { {\color{red} {l_{10}^T}} {\color{red} {U_{00}}} = \widehat a_{10}^T } \amp { {\color{red} {l_{10}^T}} {\color{blue} {u_{01}}} + {\color{blue} {\upsilon_{11}}} = \widehat \alpha_{11} } % \amp { {\color{red} {l_{10}^T}} % {\color{red} {U_{02}}} + {\color{blue} % {u_{12}^T}} = \widehat a_{12}^T } \\ \hline { {\color{red} {L_{20}}} {\color{red} {U_{00}}} = \widehat A_{20} } \amp { {\color{red} {L_{20}}} {\color{blue} {u_{01}}} + {\color{blue} {l_{21}}}{\color{blue} {\upsilon_{11}}} = \widehat a_{21} } \end{array} \end{array} \end{array} \end{equation*}

for the loop invariant to again hold after the iteration. With this, we can compute the desired parts of $L$ and $U \text{:}$

• Solve $L_{00} u_{01} = a_{01} \text{,}$ overwriting $a_{01}$ with the result.

• Update $\alpha_{11} := \upsilon_{11} = \alpha_{11} - l_{10}^T u_{01} = \alpha_{11} - a_{10}^T a_{01} \text{.}$

• Update $a_{21} := l_{21} = ( a_{21} - L_{20} u_{01} ) / \upsilon_{11} = ( a_{21} - A_{20} a_{10} ) / \alpha_{11} \text{.}$

The resulting algorithm is captured in Figure 5.33.

###### Homework5.6.3.

If $A$ is $n \times n \text{,}$ show the cost of Variant 3 is approximately $\frac{2}{3} n^3 \text{.}$

Solution

During the $k$th iteration, $A_{00}$ is $k \times k \text{,}$ for $k = 0, \ldots, n-1 \text{.}$ Then the (approximate) cost of the steps are given by

• Solve $L_{00} u_{01} = a_{01} \text{,}$ overwriting $a_{01}$ with the result. Approximate cost: $k^2$ flops.

• Update $\alpha_{11} := \upsilon_{11} = \alpha_{11} - l_{10}^T u_{01} = \alpha_{11} - a_{10}^T a_{01} \text{.}$ Approximate cost: $2k$ flops.

• Update $a_{21} := l_{21} = ( a_{21} - L_{20} u_{01} ) / \upsilon_{11} = ( a_{21} - A_{20} a_{10} ) / \alpha_{11} \text{.}$ Approximate cost: $2 ( n-k-1 )$ flops.

Thus, the total cost is approximately given by

\begin{equation*} \begin{array}{l} \sum_{k=0}^{n-1} \left( k^2 + 2k + 2 k ( n-k-1) \right) \\ ~~~=~~~~ \lt \mbox{ simplify } \gt \\ \sum_{k=0}^{n-1} \left( 2 k n -k^2 ) \right) \\ ~~~=~~~~ \lt \mbox{ algebra } \gt \\ 2 n \sum_{k=0}^{n-1} k - \sum_{k=0}^{n-1} k^2 \\ ~~~\approx~~~~ \lt \sum_{k=0}^{n-1} k \approx n^2/2; \sum_{k=0}^{n-1} k^2 \approx k^3/3 \gt \\ n^3 - \frac{n^3}{3} \\ ~~~=~~~~ \lt \mbox{ simplify } \gt \\ \frac{2}{3} n^3. \end{array} \end{equation*}

### Subsubsection5.6.1.4Variant 4: Crout algorithm

Consider next the loop invariant:

\begin{equation*} \FlaTwoByTwo {A_{TL} }{ A_{TR} } {A_{BL}}{A_{BR}} = \FlaTwoByTwo {\color{red}{ L \backslash U_{TL}} }{ \color{red}{U_{TR}}} { \color{red}{L_{BL}}}{ \widehat A_{BR}} \wedge \begin{array}{r | r} L_{TL} U_{TL} = \widehat A_{TL} \amp L_{TL} U_{TR} = \widehat A_{TR} \\ \hline L_{BL} U_{TL} = \widehat A_{TL}. \end{array} \end{equation*}

At the top of the loop, after repartitioning, $A$ then contains

\begin{equation*} \begin{array}{l} \FlaThreeByThreeBR {A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} = \left( \begin{array}{c | c c} {\color{red}{L \backslash U_{00}}} \amp \color{red}{u_{01}} \amp \color{red}{U_{02}} \\ \hline {\color{red}{l_{10}^T}} \amp \widehat \alpha_{11} \amp \widehat a_{12}^T \\ {\color{red}{L_{20}}} \amp \widehat a_{21} \amp \widehat A_{22} \end{array} \right) \\ ~~~~~ \wedge \begin{array}{r | r r} {\color{red} {L_{00} U_{00}}} = \widehat A_{00} \amp {\color{red} {L_{00} u_{01}}} = \widehat a_{01} \amp {\color{red} {L_{00} U_{02}}} = \widehat A_{02} \\ \hline {\color{red} {l_{10}^T U_{00}}} = \widehat a_{10}^T \\ {\color{red} {L_{20} U_{00}}} = \widehat A_{20} \end{array} \end{array} \end{equation*}

while after updating $A$ it must contain

\begin{equation*} \begin{array}{l} \FlaThreeByThreeTL {A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} = \left( \begin{array}{c c | c} {\color{red} {L \backslash U_{00}}} \amp {\color{red} {u_{01}}} \amp {\color{red} {U_{02}}} \\ {\color{red} {l_{10}^T}} \amp {\color{blue} {\upsilon_{11}}} \amp {\color{blue} {u_{12}^T}} \\ \hline {\color{red} {L_{20}}} \amp {\color{blue} {l_{21}}} \amp \widehat A_{22}, \end{array} \right) \\ ~~~~~ \wedge \begin{array}[t]{c} \underbrace{ % \small \begin{array}{r|r} \left( \begin{array}{c c} L_{00} \amp 0 \\ l_{10}^T \amp 1 \end{array} \right) \left( \begin{array}{c c} U_{00} \amp u_{01} \\ 0 \amp \upsilon_{11} \end{array} \right) = \left( \begin{array}{c c} \widehat A_{00} \amp \widehat a_{01} \\ \widehat a_{10}^T \amp \widehat \alpha_{11} \end{array} \right) \amp \left( \begin{array}{c c} L_{00} \amp 0 \\ l_{10}^T \amp 1 \end{array} \right) \left( \begin{array}{c c} U_{02} \\ u_{12}^T \end{array} \right) = \left( \begin{array}{c c} \widehat A_{00} \\ \widehat a_{12}^T \end{array} \right) \\ \hline \left( \begin{array}{c c} L_{20} \amp l_{21} \end{array} \right) \left( \begin{array}{c c} U_{00} \amp u_{01} \\ 0 \amp \upsilon_{11} \end{array} \right) = \left( \begin{array}{c c} \widehat A_{20} \amp \widehat a_{21} \end{array} \right) \amp \end{array} }\\ \begin{array}{r r| r } { { \color{red} {L_{00} U_{00}} } = \widehat A_{00} } \amp { {\color{red}{ L_{00}}} {\color{red} {u_{01}}} = \widehat a_{01} } \amp { {\color{red}{ L_{00}}} {\color{red} {U_{02}}} = \widehat A_{02} } \\ { {\color{red} {l_{10}^T}} {\color{red} {U_{00}}} = \widehat a_{10}^T } \amp { {\color{red} {l_{10}^T}} {\color{red} {u_{01}}} + {\color{blue} {\upsilon_{11}}} = \widehat \alpha_{11} } \amp { {\color{red} {l_{10}^T}} {\color{red} {U_{02}}} + {\color{blue} {u_{12}^T}} = \widehat a_{12}^T } \\ \hline { {\color{red} {L_{20}}} {\color{red} {U_{00}}} = \widehat A_{20} } \amp { {\color{red} {L_{20}}} {\color{red} {u_{01}}} + {\color{blue} {l_{21}}}{\color{blue} {\upsilon_{11}}} = \widehat a_{21} } \end{array} \end{array} \end{array} \end{equation*}

for the loop invariant to again hold after the iteration. With this, we can compute the desired parts of $L$ and $U \text{:}$

• Update $\alpha_{11} := \upsilon_{11} = \alpha_{11} - l_{10}^T u_{01} = \alpha_{11} - a_{10}^T a_{01} \text{.}$

• Update $a_{12}^T := u_{12}^T = a_{12}^T - l_{10}^T U_{02} = a_{12}^T - a_{10}^T A_{02} \text{.}$

• Update $a_{21} := l_{21} = ( a_{21} - L_{20} u_{01} ) / \upsilon_{11} = ( a_{21} - A_{20} a_{10} ) / \alpha_{11} \text{.}$

The resulting algorithm is captured in Figure 5.34.

###### Homework5.6.4.

If $A$ is $n \times n \text{,}$ show the cost of Variant 4 is approximately $\frac{2}{3} n^3 \text{.}$

Solution

During the $k$th iteration, $A_{00}$ is $k \times k \text{,}$ for $k = 0, \ldots, n-1 \text{.}$ Then the (approximate) cost of the steps are given by

• Update $\alpha_{11} := \upsilon_{11} = \alpha_{11} - l_{10}^T u_{01} = \alpha_{11} - a_{10}^T a_{01} \text{.}$ Approximate cost: $2 k$ flops.

• Update $a_{12}^T := u_{12}^T = a_{12}^T - l_{10}^T U_{02} = a_{12}^T - a_{10}^T A_{02} \text{.}$ Approximate cost: $2 k( n-k-1)$ flops.

• Update $a_{21} := l_{21} = ( a_{21} - L_{20} u_{01} ) / \upsilon_{11} = ( a_{21} - A_{20} a_{10} ) / \alpha_{11} \text{.}$ Approximate cost: $2 k( n-k-1) + (n-k-1)$ flops.

Thus, ignoring the $2 k$ flops for the dot product and the $n-k-1$ flops for multiplying with $1 / \alpha_{11}$ in each iteration, the total cost is approximately given by

\begin{equation*} \begin{array}{l} \sum_{k=0}^{n-1} 4 k ( n-k-1) \\ ~~~\approx~~~~ \lt \mbox{ remove lower order term } \gt \sum_{k=0}^{n-1} 4 k ( n-k) \\ ~~~=~~~~ \lt \mbox{ algebra } \gt \\ 4 n \sum_{k=0}^{n-1} k - 4 \sum_{k=0}^{n-1} k^2 \\ ~~~\approx~~~~ \lt \sum_{k=0}^{n-1} k \approx n^2/2; \sum_{k=0}^{n-1} k^2 \approx k^3/3 \gt \\ 2 n^3 - 4 \frac{n^3}{3} \\ ~~~=~~~~ \lt \mbox{ simplify } \gt \\ \frac{2}{3} n^3. \end{array} \end{equation*}

### Subsubsection5.6.1.5Variant 5: Classical Gaussian elimination

Consider final loop invariant:

\begin{equation*} \FlaTwoByTwo {A_{TL} }{ A_{TR} } {A_{BL}}{A_{BR}} = \FlaTwoByTwo {\color{red}{ L \backslash U_{TL}} }{ \color{red}{U_{TR}}} { \color{red}{L_{BL}}}{ \widehat A_{BR}- \color{red} {L_{BL} U_{TR} }} \wedge \begin{array}{r | r} L_{TL} U_{TL} = \widehat A_{TL} \amp L_{TL} U_{TR} = \widehat A_{TR} \\ \hline L_{BL} U_{TL} = \widehat A_{TL}. \end{array} \end{equation*}

At the top of the loop, after repartitioning, $A$ then contains

\begin{equation*} \begin{array}{l} \FlaThreeByThreeBR {A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} = \left( \begin{array}{c | c c} {\color{red}{L \backslash U_{00}}} \amp \color{red}{u_{01}} \amp \color{red}{U_{02}} \\ \hline {\color{red}{l_{10}^T}} \amp \widehat \alpha_{11} - {\color{red}{l_{10}^T}} {\color{red}{u_{01}}} \amp \widehat a_{12}^T - {\color{red}{l_{10}^T U_{02}}} \\ {\color{red}{L_{20}}} \amp \widehat a_{21}- {\color{red}{L_{20} u_{01}}} \amp \widehat A_{22} - {\color{red}{L_{20} U_{02}}} \end{array} \right) \\ ~~~~~ \wedge \begin{array}{r | r r} {\color{red} {L_{00} U_{00}}} = \widehat A_{00} \amp {\color{red} {L_{00} u_{01}}} = \widehat a_{01} \amp {\color{red} {L_{00} U_{02}}} = \widehat A_{02} \\ \hline {\color{red} {l_{10}^T U_{00}}} = \widehat a_{10}^T \\ {\color{red} {L_{20} U_{00}}} = \widehat A_{20} \end{array} \end{array} \end{equation*}

while after updating $A$ it must contain

\begin{equation*} \begin{array}{l} \FlaThreeByThreeTL {A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} = \left( \begin{array}{c c | c} {\color{red} {L \backslash U_{00}}} \amp {\color{red} {u_{01}}} \amp {\color{red} {U_{02}}} \\ {\color{red} {l_{10}^T}} \amp {\color{blue} {\upsilon_{11}}} \amp {\color{blue} {u_{12}^T}} \\ \hline {\color{red} {L_{20}}} \amp {\color{blue} {l_{21}}} \amp \widehat A_{22} -{\color{blue} {l_{21}}} {\color{blue} {u_{12}^T}} , \end{array} \right) \\ ~~~~~ \wedge \begin{array}[t]{c} \underbrace{ % \small \begin{array}{r|r} \left( \begin{array}{c c} L_{00} \amp 0 \\ l_{10}^T \amp 1 \end{array} \right) \left( \begin{array}{c c} U_{00} \amp u_{01} \\ 0 \amp \upsilon_{11} \end{array} \right) = \left( \begin{array}{c c} \widehat A_{00} \amp \widehat a_{01} \\ \widehat a_{10}^T \amp \widehat \alpha_{11} \end{array} \right) \amp \left( \begin{array}{c c} L_{00} \amp 0 \\ l_{10}^T \amp 1 \end{array} \right) \left( \begin{array}{c c} U_{02} \\ u_{12}^T \end{array} \right) = \left( \begin{array}{c c} \widehat A_{00} \\ \widehat a_{12}^T \end{array} \right) \\ \hline \left( \begin{array}{c c} L_{20} \amp l_{21} \end{array} \right) \left( \begin{array}{c c} U_{00} \amp u_{01} \\ 0 \amp \upsilon_{11} \end{array} \right) = \left( \begin{array}{c c} \widehat A_{20} \amp \widehat a_{21} \end{array} \right) \amp \end{array} }\\ \begin{array}{r r| r } { { \color{red} {L_{00} U_{00}} } = \widehat A_{00} } \amp { {\color{red}{ L_{00}}} {\color{red} {u_{01}}} = \widehat a_{01} } \amp { {\color{red}{ L_{00}}} {\color{red} {U_{02}}} = \widehat A_{02} } \\ { {\color{red} {l_{10}^T}} {\color{red} {U_{00}}} = \widehat a_{10}^T } \amp { {\color{red} {l_{10}^T}} {\color{red} {u_{01}}} + {\color{blue} {\upsilon_{11}}} = \widehat \alpha_{11} } \amp { {\color{red} {l_{10}^T}} {\color{red} {U_{02}}} + {\color{blue} {u_{12}^T}} = \widehat a_{12}^T } \\ \hline { {\color{red} {L_{20}}} {\color{red} {U_{00}}} = \widehat A_{20} } \amp { {\color{red} {L_{20}}} {\color{red} {u_{01}}} + {\color{blue} {l_{21}}}{\color{blue} {\upsilon_{11}}} = \widehat a_{21} } \end{array} \end{array} \end{array} \end{equation*}

for the loop invariant to again hold after the iteration. With this, we can compute the desired parts of $L$ and $U \text{:}$

• $\alpha_{11} := \upsilon_{11} = \widehat \alpha_{11} - l_{10}^T u_{01} = \alpha_{11}$ (no-op).

($\alpha_{11}$ already equals $\widehat \alpha_{11} - l_{01}^T u_{01}\text{.}$)

• $a_{12}^T := u_{12}^T = \widehat a_{12}^T - l_{10}^T U_{02} = a_{12}^T$ (no-op).

($a_{12}^T$ already equals $\widehat a_{12}^T - l_{01}^T U_{02}\text{.}$)

• Update $a_{21} := (\widehat a_{21} - L_{20} u_{01}) / \upsilon_{11} = a_{21} / \alpha_{11} \text{.}$

($a_{21}$ already equals $\widehat a_{21} - L_{20} u_{01}\text{.}$)

• Update $A_{22} := \widehat A_{22} - L_{20} U_{02} - l_{21} u_{12}^T = A_{22} - a_{21} a_{12}^T$

($A_{22}$ already equals $\widehat A_{22} - L_{20} U_{02}\text{.}$)

The resulting algorithm is captured in Figure 5.35.

###### Homework5.6.5.

If $A$ is $n \times n \text{,}$ show the cost of Variant 5 is approximately $\frac{2}{3} n^3 \text{.}$

Solution

During the $k$th iteration, $A_{00}$ is $k \times k \text{,}$ for $k = 0, \ldots, n-1 \text{.}$ Then the (approximate) cost of the steps are given by

• Update $a_{21} := l_{21} = a_{21} / \alpha_{11} \text{.}$ Approximate cost: $k$ flops.

• Update $A_{22} := A_{22} - l_{21} u_{12}^T = A_{22} - a_{21} a_{12}^T \text{.}$ Approximate cost: $2 (n-k-1) (n-k-1)$ flops.

Thus, ignoring $n-k-1$ flops for multiplying with $1 / \alpha_{11}$ in each iteration, the total cost is approximately given by

\begin{equation*} \begin{array}{l} \sum_{k=0}^{n-1} 2 ( n-k-1)^2 \\ ~~~=~~~~ \lt \mbox{ change of variable } j = n-k-1 \gt \\ 2 \sum_{j=0}^{n-1} j^2 \\ ~~~\approx~~~~ \lt \sum_{k=0}^{n-1} k^2 \approx k^3/3 \gt \\ 2 \frac{n^3}{3} \\ \end{array} \end{equation*}

### Subsubsection5.6.1.6Discussion

###### Remark5.36.

For a discussion of the different LU factorization algorithms that also gives a historic perspective, we recommend "Matrix Algorithms Volume 1" by G.W. Stewart~\cite{Stewart:fact}.