ALAFF Solving with a triangular matrix

Subsection 5.3.5 Solving with a triangular matrix

We are left to discuss how to solve \(L z = y \) and \(U x = z \text{.}\)

Subsubsection 5.3.5.1 Algorithmic Variant 1

Consider \(L z = y \) where \(L \) is unit lower triangular. Partition

\begin{equation*} L \rightarrow \FlaTwoByTwoSingleLine {1}{0} {l_{21}}{L_{22}}, \quad z \rightarrow \FlaTwoByOneSingleLine { \zeta_1} { z_2 } \quad \mbox{and} \quad y \rightarrow \FlaTwoByOneSingleLine { \psi_1} { y_2 } . \end{equation*}

Then

\begin{equation*} \begin{array}[t]{c} \underbrace{ \FlaTwoByTwoSingleLine {1}{0} {l_{21}}{L_{22}} } \\ L \end{array} \begin{array}[t]{c} \underbrace{ \FlaTwoByOneSingleLine { \zeta_1} { z_2 } } \\ z \end{array} = \begin{array}[t]{c} \underbrace{ \FlaTwoByOneSingleLine { \psi_1} { y_2 } } \\ y \end{array} . \end{equation*}

Multiplying out the left-hand side yields

\begin{equation*} \FlaTwoByOneSingleLine {\zeta_1} {\zeta_1 l_{21} + L_{22} z_2 } = \FlaTwoByOneSingleLine { \psi_1} { y_2 } \end{equation*}

and the equalities

\begin{equation*} \begin{array}{rcl} {\zeta_1} \amp = \amp \psi_1 \\ {\zeta_1 l_{21} + L_{22} z_2 } \amp = \amp y_2, \end{array} \end{equation*}

which can be rearranged as

\begin{equation*} \begin{array}{rcl} {\zeta_1} \amp = \amp \psi_1 \\ {L_{22} z_2 } \amp = \amp y_2 - \zeta_1 l_{21} . \end{array} \end{equation*}

We conclude that in the current iteration

\(\psi_1 \) needs not be updated.
\(y_2 := y_2 - \psi_1 l_{21} \)

So that in future iterations \(L_{22} z_2 = y_2 \) (updated!) will be solved, updating \(z_2 \text{.}\)

These insights justify the algorithm in Figure 5.3.5.1, which overwrites \(y \) with the solution to \(L z = y \text{.}\)

\begin{equation*} \newcommand{\routinename}{\mbox{Solve } L z = y, \mbox{ overwriting } y \mbox{ with } z \mbox{ (Variant 1)}} \newcommand{\guard}{ n( L_{TL} ) \lt n( L ) } \newcommand{\partitionings}{ L \rightarrow \FlaTwoByTwo{L_{TL}}{L_{TR}} {L_{BL}}{L_{BR}}, y \rightarrow \FlaTwoByOne{y_T}{y_B} } \newcommand{\partitionsizes}{ L_{TL} {\rm ~is~} 0 \times 0 \mbox{ and } y_T \mbox{ has } 0 \mbox{ elements} } \newcommand{\repartitionings}{ \FlaTwoByTwo{L_{TL}}{L_{TR}} {L_{BL}}{L_{BR}} \rightarrow \FlaThreeByThreeBR{L_{00}}{l_{01}}{L_{02}} {l_{10}^T}{\lambda_{11}}{l_{12}^T} {L_{20}}{l_{21}}{L_{22}}, \FlaTwoByOne{y_T}{y_B} \rightarrow \FlaThreeByOneB{y_0}{\psi_1}{y_2} } \newcommand{\moveboundaries}{ \FlaTwoByTwo{L_{TL}}{L_{TR}} {L_{BL}}{L_{BR}} \leftarrow \FlaThreeByThreeTL{L_{00}}{l_{01}}{L_{02}} {l_{10}^T}{\lambda_{11}}{l_{12}^T} {L_{20}}{l_{21}}{L_{22}}, \FlaTwoByOne{y_T}{y_B} \leftarrow \FlaThreeByOneT{y_0}{\psi_1}{y_2} } \newcommand{\update}{ \begin{array}{l} y_2 := y_2 - \psi_1 l_{21} \end{array} } \FlaAlgorithm \end{equation*}

Figure 5.3.5.1. Lower triangular solve (with unit lower triangular matrix), Variant 1

Homework 5.3.5.1.

Derive a similar algorithm for solving \(U x = z \text{.}\) Update the below skeleton algorithm with the result. (Don't forget to put in the lines that indicate how you "partition and repartition" through the matrix.)

\begin{equation*} \newcommand{\routinename}{\mbox{Solve } U x = z, \mbox{ overwriting } z \mbox{ with } x \mbox{ (Variant 1)}} \newcommand{\guard}{ n( U_{BR} ) \lt n( U ) } \newcommand{\partitionings}{ U \rightarrow \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}}, z \rightarrow \FlaTwoByOne{z_T}{z_B} } \newcommand{\partitionsizes}{ U_{BR} {\rm ~is~} 0 \times 0 \mbox{ and } z_B \mbox{ has } 0 \mbox{ elements} } \newcommand{\repartitionings}{ \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}} \rightarrow \left( \begin{array}{c c c} {U_{00}} \amp {u_{01}} \amp {U_{02}} \\ {u_{10}^T} \amp {\upsilon_{11}} \amp {u_{12}^T} \\ {U_{20}} \amp {u_{21}} \amp {U_{22}} \end{array} \right), \FlaTwoByOne{z_T}{z_B} \rightarrow \left( \begin{array}{c c c} {z_0}\\ {\zeta_1}\\ {z_2} \end{array} \right) } \newcommand{\moveboundaries}{ \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}} \leftarrow \left( \begin{array}{c c c} {U_{00}} \amp {u_{01}} \amp {U_{02}} \\ {u_{10}^T} \amp {\upsilon_{11}} \amp {u_{12}^T} \\ {U_{20}} \amp {u_{21}} \amp {U_{22}} \end{array} \right), \FlaTwoByOne{z_T}{z_B} \leftarrow \left( \begin{array}{c} {z_0}\\ {\zeta_1}\\ {z_2} \end{array} \right) } \newcommand{\update}{ \begin{array}{l} ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\\ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\\ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \end{array} } \FlaAlgorithm \end{equation*}

Hint: Partition

\begin{equation*} \FlaTwoByTwoSingleLine{ U_{00} }{ u_{01} }{ 0 }{ \upsilon_{11}} \FlaTwoByOneSingleLine{ x_0 }{ \chi_1} = \FlaTwoByOneSingleLine{ z_0 }{ \zeta_1 }. \end{equation*}

Solution

Multiplying this out yields

\begin{equation*} \FlaTwoByOneSingleLine{ U_{00} x_0 + u_{01} \chi_1 }{\upsilon_{11} \chi_1 } = \FlaTwoByOneSingleLine{ z_0 }{ \zeta_1 }. \end{equation*}

So, \(\chi_1 = \zeta_1 / \upsilon_{11} \) after which \(x_0 \) can be computed by solving \(U_{00} x_0 = z_0 - \chi_1 u_{01} \text{.}\) The resulting algorithm is then given by

\begin{equation*} \newcommand{\routinename}{\mbox{Solve } U x = z, \mbox{ overwriting } z \mbox{ with } x \mbox{ (Variant 1)}} \newcommand{\guard}{ n( U_{BR} ) \lt n( U ) } \newcommand{\partitionings}{ U \rightarrow \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}}, z \rightarrow \FlaTwoByOne{z_T}{z_B} } \newcommand{\partitionsizes}{ U_{BR} {\rm ~is~} 0 \times 0 \mbox{ and } z_B \mbox{ has } 0 \mbox{ elements} } \newcommand{\repartitionings}{ \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}} \rightarrow \FlaThreeByThreeTL{U_{00}}{u_{01}}{U_{02}} {u_{10}^T}{\upsilon_{11}}{u_{12}^T} {U_{20}}{u_{21}}{U_{22}}, \FlaTwoByOne{z_T}{z_B} \rightarrow \FlaThreeByOneT{z_0}{\zeta_1}{z_2} } \newcommand{\moveboundaries}{ \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}} \leftarrow \FlaThreeByThreeBR{U_{00}}{u_{01}}{U_{02}} {u_{10}^T}{\upsilon_{11}}{u_{12}^T} {U_{20}}{u_{21}}{U_{22}}, \FlaTwoByOne{z_T}{z_B} \leftarrow \FlaThreeByOneB{z_0}{\zeta_1}{z_2} } \newcommand{\update}{ \begin{array}{l} \zeta_1 := \zeta_1 / \upsilon_{11} \\ z_0 := z_0 - \zeta_1 u_{01} \end{array} } \FlaAlgorithm \end{equation*}

Subsubsection 5.3.5.2 Algorithmic Variant 2

An alternative algorithm can be derived as follows: Partition

\begin{equation*} L \rightarrow \FlaTwoByTwoSingleLine {L_{00}}{0} {l_{10}^T}{ 1 }, \quad z \rightarrow \FlaTwoByOneSingleLine { z_0 } { \zeta_1} \quad \mbox{and} \quad y \rightarrow \FlaTwoByOneSingleLine { y_0 } { \psi_1} . \end{equation*}

Then

\begin{equation*} \begin{array}[t]{c} \underbrace{ \FlaTwoByTwoSingleLine {L_{00}}{0} {l_{10}^T}{ 1} } \\ L \end{array} \begin{array}[t]{c} \underbrace{ \FlaTwoByOneSingleLine { z_0 } { \zeta_1} } \\ z \end{array} = \begin{array}[t]{c} \underbrace{ \FlaTwoByOneSingleLine { y_0 } { \psi_1} } \\ y \end{array} . \end{equation*}

Multiplying out the left-hand side yields

\begin{equation*} \FlaTwoByOneSingleLine { L_{00} z_0 } { l_{10}^T z_0 + \zeta_1} = \FlaTwoByOneSingleLine { y_0 } { \psi_1} \end{equation*}

and the equalities

\begin{equation*} \begin{array}{rcl} L_{00} z_0 \amp = \amp y_0 \\ {l_{10}^T z_0 + \zeta_1 } \amp = \amp \psi_1. \end{array} \end{equation*}

The idea now is as follows: Assume that the elements of \(z_0 \) were computed in previous iterations in the algorithm in Figure 5.3.5.2, overwriting \(y_0 \text{.}\) Then in the current iteration we must compute \(\zeta_1 := \psi_1 - l_{10}^T z_0 \text{,}\) overwriting \(\psi_1 \text{.}\)

\begin{equation*} \newcommand{\routinename}{\mbox{Solve } L z = y, \mbox{ overwriting } y \mbox{ with } z \mbox{ (Variant 2)}} \newcommand{\guard}{ n( L_{TL} ) \lt n( L ) } \newcommand{\partitionings}{ L \rightarrow \FlaTwoByTwo{L_{TL}}{L_{TR}} {L_{BL}}{L_{BR}}, y \rightarrow \FlaTwoByOne{y_T}{y_B} } \newcommand{\partitionsizes}{ L_{TL} {\rm ~is~} 0 \times 0 \mbox{ and } y_T \mbox{ has } 0 \mbox{ elements} } \newcommand{\repartitionings}{ \FlaTwoByTwo{L_{TL}}{L_{TR}} {L_{BL}}{L_{BR}} \rightarrow \FlaThreeByThreeBR{L_{00}}{l_{01}}{L_{02}} {l_{10}^T}{\lambda_{11}}{l_{12}^T} {L_{20}}{l_{21}}{L_{22}}, \FlaTwoByOne{y_T}{y_B} \rightarrow \FlaThreeByOneB{y_0}{\psi_1}{y_2} } \newcommand{\moveboundaries}{ \FlaTwoByTwo{L_{TL}}{L_{TR}} {L_{BL}}{L_{BR}} \leftarrow \FlaThreeByThreeTL{L_{00}}{l_{01}}{L_{02}} {l_{10}^T}{\lambda_{11}}{l_{12}^T} {L_{20}}{l_{21}}{L_{22}}, \FlaTwoByOne{y_T}{y_B} \leftarrow \FlaThreeByOneT{y_0}{\psi_1}{y_2} } \newcommand{\update}{ \begin{array}{l} \psi_1 := \psi_1 - l_{10}^T y_0 \end{array} } \FlaAlgorithm \end{equation*}

Figure 5.3.5.2. Lower triangular solve (with unit lower triangular matrix), Variant 2

Homework 5.3.5.2.

\begin{equation*} \newcommand{\routinename}{\mbox{Solve } U x = z, \mbox{ overwriting } z \mbox{ with } x \mbox{ (Variant 2)}} \newcommand{\guard}{ n( U_{BR} ) \lt n( U ) } \newcommand{\partitionings}{ U \rightarrow \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}}, z \rightarrow \FlaTwoByOne{z_T}{z_B} } \newcommand{\partitionsizes}{ U_{BR} {\rm ~is~} 0 \times 0 \mbox{ and } z_B \mbox{ has } 0 \mbox{ elements} } \newcommand{\repartitionings}{ \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}} \rightarrow \left( \begin{array}{c c c} {U_{00}} \amp {u_{01}} \amp {U_{02}} \\ {u_{10}^T} \amp {\upsilon_{11}} \amp {u_{12}^T} \\ {U_{20}} \amp {u_{21}} \amp {U_{22}} \end{array} \right), \FlaTwoByOne{z_T}{z_B} \rightarrow \left( \begin{array}{c c c} {z_0}\\ {\zeta_1}\\ {z_2} \end{array} \right) } \newcommand{\moveboundaries}{ \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}} \leftarrow \left( \begin{array}{c c c} {U_{00}} \amp {u_{01}} \amp {U_{02}} \\ {u_{10}^T} \amp {\upsilon_{11}} \amp {u_{12}^T} \\ {U_{20}} \amp {u_{21}} \amp {U_{22}} \end{array} \right), \FlaTwoByOne{z_T}{z_B} \leftarrow \left( \begin{array}{c} {z_0}\\ {\zeta_1}\\ {z_2} \end{array} \right) } \newcommand{\update}{ \begin{array}{l} ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\\ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\\ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \end{array} } \FlaAlgorithm \end{equation*}

Hint: Partition

\begin{equation*} U \rightarrow \FlaTwoByTwoSingleLine{ \upsilon_{11} }{ u_{12}^T }{0}{U_{22}}. \end{equation*}

Solution

Partition

\begin{equation*} \FlaTwoByTwoSingleLine{ \upsilon_{11} }{ u_{12}^T }{0}{U_{22}} \FlaTwoByOneSingleLine{ \chi_1 }{ x_2 } = \FlaTwoByOneSingleLine{ \zeta_1 }{ z_2 }. \end{equation*}

Multiplying this out yields

\begin{equation*} \FlaTwoByOneSingleLine{ \upsilon_{11} \chi_1 + u_{12}^T x_2}{ U_{22} x_2 } = \FlaTwoByOneSingleLine{ \zeta_1 }{ z_2 }. \end{equation*}

So, if we assume that \(x_2 \) has already been computed and has overwritten \(z_2 \text{,}\) then \(\chi_1 \) can be computed as

\begin{equation*} \chi_1 = ( \zeta_1 - u_{12}^T x_2 ) / \upsilon_{11} \end{equation*}

which can then overwrite \(\zeta_1 \text{.}\) The resulting algorithm is given by

\begin{equation*} \newcommand{\routinename}{\mbox{Solve } U x = z, \mbox{ overwriting } z \mbox{ with } x \mbox{ (Variant 2)}} \newcommand{\guard}{ n( U_{BR} ) \lt n( U ) } \newcommand{\partitionings}{ U \rightarrow \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}}, z \rightarrow \FlaTwoByOne{z_T}{z_B} } \newcommand{\partitionsizes}{ U_{BR} {\rm ~is~} 0 \times 0 \mbox{ and } z_B \mbox{ has } 0 \mbox{ elements} } \newcommand{\repartitionings}{ \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}} \rightarrow \FlaThreeByThreeTL{U_{00}}{u_{01}}{U_{02}} {u_{10}^T}{\upsilon_{11}}{u_{12}^T} {U_{20}}{u_{21}}{U_{22}}, \FlaTwoByOne{z_T}{z_B} \rightarrow \FlaThreeByOneT{z_0}{\zeta_1}{z_2} } \newcommand{\moveboundaries}{ \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}} \leftarrow \FlaThreeByThreeBR{U_{00}}{u_{01}}{U_{02}} {u_{10}^T}{\upsilon_{11}}{u_{12}^T} {U_{20}}{u_{21}}{U_{22}}, \FlaTwoByOne{z_T}{z_B} \leftarrow \FlaThreeByOneB{z_0}{\zeta_1}{z_2} } \newcommand{\update}{ \begin{array}{l} \zeta_1 := \zeta_1 - u_{12}^T z_2 \\ \zeta_1 := \zeta_1 / \upsilon_{11} \end{array} } \FlaAlgorithm \end{equation*}

Homework 5.3.5.3.

Let \(L \) be an \(m \times m \) unit lower triangular matrix. If a multiply and add each require one flop, what is the approximate cost of solving \(L x = y \text{?}\)

Solution

Let us analyze Variant 1.

Let \(L_{00} \) be \(k \times k \) in a typical iteration. Then \(y_2 \) is of size \(m - k - 1 \) and \(y_2 := y_2 - \psi_1 l_{21} \) requires \(2( m - k - 1 ) \) flops. Summing this over all iterations requires

\begin{equation*} \sum_{k=0}^{m-1} [ 2( m-k-1 ) ] \mbox{ flops}. \end{equation*}

The change of variables \(j = m - k - 1 \) yields

\begin{equation*} \sum_{k=0}^{m-1} [ 2( m-k-1 ) ] = 2 \sum_{j=0}^{m-1} j \approx m^2. \end{equation*}

Thus, the cost is approximately \(m^2 \) flops.

Subsubsection 5.3.5.3 Discussion

Computation tends to be more efficient when matrices are accessed by column, since in scientific computing applications tend to store matrices by columns (in column-major order). This dates back to the days when Fortran ruled supreme. Accessing memory consecutively improves performance, so computing with columns tends to be more efficient than computing with rows.

Variant 1 for each of the algorithms casts computation in terms of columns of the matrix that is involved;

\begin{equation*} y_2 := y_2 - \psi_1 l_{21} \end{equation*}

and

\begin{equation*} z_0 := z_0 - \zeta_1 u_{01}. \end{equation*}

These are called axpy operations:

\begin{equation*} y := \alpha x + y. \end{equation*}

"alpha times x plus y." In contrast, Variant 2 casts computation in terms of rows of the matrix that is involved:

\begin{equation*} \psi_1 := \psi_1 - l_{10}^T y_0 \end{equation*}

and

\begin{equation*} \zeta_1 := \zeta_1 - u_{12}^T z_2 \end{equation*}

perform dot products.