## Subsection5.2.2LU factorization: The right-looking algorithm

In the launch of this week, we mentioned an algorithm that computes the LU factorization of a given matrix $A$ so that

\begin{equation*} A = L U, \end{equation*}

where $L$ is a unit lower triangular matrix and $U$ is an upper triangular matrix. We now derive that algorithm, which is often called the right-looking algorithm for computing the LU factorization.

Partition $A \text{,}$ $L \text{,}$ and $U$ as follows:

\begin{equation*} A \rightarrow \FlaTwoByTwoSingleLine { \alpha_{11} }{ a_{12}^T } { a_{21} }{ A_{22} }, \quad L \rightarrow \FlaTwoByTwoSingleLine { 1 }{ 0 } { l_{21} }{ L_{22} }, \quad \mbox{and} \quad U \rightarrow \FlaTwoByTwoSingleLine { \upsilon_{11} }{ u_{12}^T } { 0 }{ U_{22} }. \end{equation*}

Then $A = L U$ means that

\begin{equation*} \FlaTwoByTwoSingleLine { \alpha_{11} }{ a_{12}^T } { a_{21} }{ A_{22} } = \FlaTwoByTwoSingleLine { 1 }{ 0 } { l_{21} }{ L_{22} } \FlaTwoByTwoSingleLine { \upsilon_{11} }{ u_{12}^T } { 0 }{ U_{22} } = \FlaTwoByTwoSingleLine { \upsilon_{11} }{ u_{12}^T } { l_{21} \upsilon_{11} }{ l_{21} u_{12}^T + L_{22} U_{22} }. \end{equation*}

Hence

\begin{equation*} \FlaTwoByTwoSingleLineNoPar { \alpha_{11} = \upsilon_{11} }{ a_{12}^T = u_{12}^T } { a_{21} = \upsilon_{11} l_{21}}{ A_{22} = l_{21} u_{12}^T + L_{22} U_{22} } \end{equation*}

or, equivalently,

\begin{equation*} \FlaTwoByTwoSingleLineNoPar { \alpha_{11} = \upsilon_{11} }{ a_{12}^T = u_{12}^T } { a_{21} = \upsilon_{11} l_{21}}{ A_{22} - l_{21} u_{12}^T = L_{22} U_{22}. } \end{equation*}

If we overwrite the upper triangular part of $A$ with $U$ and the strictly lower triangular part of $A$ with the strictly lower triangular part of $L$ (since we know that its diagonal consists of ones), we deduce that we must perform the computations

• $a_{21} := l_{21} = a_{21} / \alpha_{11} \text{.}$

• $A_{22} \becomes A_{22} - l_{21} a_{12}^T = A_{22} - {\color{blue} a}_{21} a_{12}^T \text{.}$

• Continue by computing the LU factorization of the updated $A_{22} \text{.}$

The resulting algorithm is given in Figure 5.2.2.1.

Before we discuss the cost of this algorithm, let us discuss a trick that is often used in the analysis of the cost of algorithms in linear algebra. We can approximate sums with integrals:

\begin{equation*} \sum_{k=0}^{n-1} k^p \approx \int_{0}^n x^p dx = \frac{1}{p+1} \left. x^{p+1} \right\vert_{0}^n = \frac{1}{p+1} n^{p+1}. \end{equation*}
###### Homework5.2.2.1.

Give the approximate cost incurred by the algorithm in Figure 5.2.2.1 when applied to an $n \times n$ matrix.

Approximately $\frac{2}{3} n^3$ flops.

Solution

Consider the iteration where $A_{TL}$ is (initially) $k \times k \text{.}$ Then

• $a_{21}$ is of size $n-k-1 \text{.}$ Thus $a_{21} := a_{21} / \alpha_{11}$ is typically computed by first computing $1/\alpha_{11}$ and then $a_{21} := (1/\alpha_{11} ) a_{21} \text{,}$ which requires $(n-k-1)$ flops. (The cost of computing $1/\alpha_{11}$ is inconsequential when $n$ is large, so it is usually ignored.)

• $A_{22}$ is of size $(n-k-1) \times ( n-k-1)$ and hence the rank-1 update $A_{22} := A_{22} - a_{21} a_{12}^T$ requires $2 ( n-k-1 ) ( n-k-1 )$ flops.

Now, the cost of updating $a_{21}$ is small relative to that of the update of $A_{22}$ and hence will be ignored. Thus, the total cost is given by, approximately,

\begin{equation*} \sum_{k=0}^{n-1} 2 ( n-k-1 )^2 \mbox{ flops}. \end{equation*}

Let us now simplify this:

\begin{equation*} \begin{array}{l} \sum_{k=0}^{n-1} 2 ( n-k-1 )^2 \\ ~~~=~~~~ \lt \mbox{ change of variable: } j = n- k-1 \gt \\ \sum_{j=0}^{n-1} 2 j^2 \\ ~~~=~~~~ \lt \mbox{ algebra } \gt \\ 2 \sum_{j=0}^{n-1} j^2 \\ ~~~ \approx ~~~~ \lt \sum_{j=0}^{n-1} j^2 \approx \int_0^n x^2 dx = n^3/3 \gt \\ \frac{2}{3} n^3 \end{array} \end{equation*}
###### Homework5.2.2.2.

Give the approximate cost incurred by the algorithm in Figure 5.2.2.1 when applied to an $m \times n$ matrix.

Approximately $m n^2 - \frac{1}{3} n^3$ flops.

Solution

Consider the iteration where $A_{TL}$ is (initially) $k \times k \text{.}$ Then

• $a_{21}$ is of size $m-k-1 \text{.}$ Thus $a_{21} := a_{21} / \alpha_{11}$ is typically computed by first computing $1/\alpha_{11}$ and then $a_{21} := (1/\alpha_{11} ) a_{21} \text{,}$ which requires $(m-k-1)$ flops. (The cost of computing $1/\alpha_{11}$ is inconsequential when $m$ is large.)

• $A_{22}$ is of size $(m-k-1) \times ( n-k-1)$ and hence the rank-1 update $A_{22} := A_{22} - a_{21} a_{12}^T$ requires $2 ( m-k-1 ) ( n-k-1 )$ flops.

Now, the cost of updating $a_{21}$ is small relative to that of the update of $A_{22}$ and hence will be ignored. Thus, the total cost is given by, approximately,

\begin{equation*} \sum_{k=0}^{n-1} 2 ( m-k-1 ) ( n-k-1 ) \mbox{ flops}. \end{equation*}

Let us now simplify this:

\begin{equation*} \begin{array}{l} \sum_{k=0}^{n-1} 2 ( m-k-1 ) ( n-k-1 ) \\ ~~~=~~~~ \lt \mbox{ change of variable: } j = n- k-1 \gt \\ \sum_{j=0}^{n-1} 2 ( m-(n-j-1)-1 ) j \\ ~~~=~~~~ \lt \mbox{ simplify } \gt \\ \sum_{j=0}^{n-1} 2 ( m-n + j ) j \\ ~~~=~~~~ \lt \mbox{ algebra } \gt \\ 2 (m-n) \sum_{j=0}^{n-1} j + 2 \sum_{j=0}^{n-1} j^2 \\ ~~~ \approx ~~~~ \lt \sum_{j=0}^{n-1} j \approx n^2/2 \mbox{ and } \sum_{j=0}^{n-1} j^2 \approx n^3/3 \gt \\ (m-n) n^2 + \frac{2}{3} n^3 \\ ~~~=~~~~ \lt \mbox{ simplify } \gt \\ m n^2 - \frac{1}{3} n^3 \end{array} \end{equation*}
###### Remark5.2.2.2.

In a practical application of LU factorization, it is uncommon to factor a non-square matrix. However, high-performance implementations of the LU factorization that use "blocked" algorithms perform a factorization of a rectangular submatrix of $A \text{,}$ which is why we generalize beyond the square case.

###### Homework5.2.2.3.

It is a good idea to perform a "git pull" in the Assignments directory to update with the latest files before you start new programming assignments.

Implement the algorithm given in Figure 5.2.2.1 as

function [ A_out ] = LU_right_looking( A )


by completing the code in Assignments/Week05/matlab/LU_right_looking.m. Input is an $m \times n$ matrix $A\text{.}$ Output is the matrix $A$ that has been overwritten by the LU factorization. You may want to use Assignments/Week05/matlab/test_LU_right_looking.m to check your implementation.

Solution

See Assignments/Week05/answers/LU_right_looking.m. (Assignments/Week05/answers/LU_right_looking.m)