LAFF Triangular matrix-vector multiplication

Subsection 4.3.2 Triangular matrix-vector multiplication

Homework 4.3.2.1.

Write routines

[ y_out ] = Trmvp_un_unb_var1 ( U, x, y )
[ y_out ] = Trmvp_un_unb_var2( U, x, y )

that implement the algorithms in Figure~\ref{fig-utrsv} that compute \(y := U x + y \text{.}\)

Homework 4.3.2.2.

\label{hw:4322} Modify the algorithms in Figure~\ref{fig-4322} so that they compute \(y \becomes L x + y \text{,}\) where \(L \) is a lower triangular matrix: (Just strike out the parts that evaluate to zero. We suggest you do this homework in conjunction with the next one.)

Solution

Homework 4.3.2.3.

Write the \pythonversion{routines} \matlabversion{functions}

[ y_out ] = Trmvp_ln_unb_var1 ( L, x, y )
[ y_out ] = Trmvp_ln_unb_var2( L, x, y )

that implement the algorithms for computing \(y := L x + y \) from (((Unresolved xref, reference "EXFourThreeTwo"; check spelling or use "provisional" attribute))) .

Homework 4.3.2.4.

\label{hw:4324} Modify the algorithms in Figure~\ref{fig-4324} to compute \(x \becomes U x \text{,}\) where \(U \) is an upper triangular matrix. You may not use \(y \text{.}\) You have to overwrite \(x \) without using work space. Hint: Think carefully about the order in which elements of \(x \) are computed and overwritten. You may want to do this exercise hand-in-hand with the implementation in the next homework.

Solution

Homework 4.3.2.5.

Write routines

[ x_out ] = Trmv_un_unb_var1 ( U, x )
[ x_out ] = Trmv_un_unb_var2( U, x )

that implement the algorithms for computing \(x := U x \) from Homework~\ref{hw:4324}.

Homework 4.3.2.6.

\label{hw:4326} Modify the algorithms in Figure~\ref{fig-4326} to compute \(x \becomes L x \text{,}\) where \(L \) is a lower triangular matrix. You may not use \(y \text{.}\) You have to overwrite \(x \) without using work space.

Hint

Think carefully about the order in which elements of \(x \) are computed and overwritten. This question is VERY tricky... You may want to do this exercise hand-in-hand with the implementation in the next homework.

Solution

The key is that you have to march through the matrix and vector from the "bottom-right" to the "top-left". In other words, in the opposite direction!

Homework 4.3.2.7.

Write routines

[ y_out ] = Trmv_ln_unb_var1 ( L, x )
[ y_out ] = Trmv_ln_unb_var2( L, x )

that implement the algorithms from Homework~\ref{hw:4326} for computing \(x := L x \text{.}\)

Homework 4.3.2.8.

Develop algorithms for computing \(y \becomes U^T x + y \) and \(y \becomes L^T x + y \text{,}\) where \(U \) and \(L \) are respectively upper triangular and lower triangular. Do not explicitly transpose matrices \(U \) and \(L \text{.}\) Write routines

[ y_out ] = Trmvp_ut_unb_var1 ( U, x, y )
[ y_out ] = Trmvp_ut_unb_var2( U, x, y )
[ y_out ] = Trmvp_lt_unb_var1 ( L, x, y )
[ y_out ] = Trmvp_ln_unb_var2( L, x, y )

that implement these algorithms.

Homework 4.3.2.9.

Develop algorithms for computing \(x \becomes U^T x \) and \(x \becomes L^T x \text{,}\) where \(U \) and \(L \) are respectively upper triangular and lower triangular. Do not explicitly transpose matrices \(U \) and \(L \text{.}\) Write routines

[ y_out ] = Trmv_ut_unb_var1 ( U, x )
[ y_out ] = Trmv_ut_unb_var2( U, x )
[ y_out ] = Trmv_lt_unb_var1 ( L, x )
[ y_out ] = Trmv_ln_unb_var2( L, x )

that implement these algorithms.

Homework 4.3.2.10.

Compute the cost, in flops, of the algorithm for computing \(y := L x + y \) that uses \axpy\ s.

Solution

For the axpy based algorithm, the cost is in the updates

\(\psi_1 := \lambda_{11} \chi_1 + \psi_1 \) (which requires two flops) ; followed by
\(y_2 \becomes \chi_1 l_{21} + y_2 \text{.}\)

Now, during the first iteration, \(y_2 \) and \(l_{21}\) and \(x_2 \) are of size \(n-1 \text{,}\) so that that iteration requires \(2(n-1) + 2 = 2 n \) flops. During the \(k \)th iteration (starting with \(k = 0 \)), \(y_2 \) and \(l_{21} \) are of size \((n-k-1) \) so that the cost of that iteration is \(2( n-k-1 ) + 2 = 2 ( n-k ) \) flops. Thus, if \(L \) is an \(n \times n \) matrix, then the total cost is given by

\begin{equation*} \sum_{k=0}^{n-1} \left[ 2 (n-k) \right] = 2 \sum_{k=0}^{n-1} (n-k) = 2 ( n + (n-1) + \cdots + 1 ) = 2 \sum_{k=1}^{n} k = 2 (n+1) n / 2. \end{equation*}

flops. (Recall that we proved in the second week that \(\sum_{i=1}^{n} i = \frac{n(n+1)}{2} \text{.}\))

Homework 4.3.2.11.

As hinted at before: Implementations achieve better performance (finish faster) if one accesses data consecutively in memory. Now, most scientific computing codes store matrices in "column-major order" which means that the first column of a matrix is stored consecutively in memory, then the second column, and so forth. Now, this means that an algorithm that accesses a matrix by columns tends to be faster than an algorithm that accesses a matrix by rows. That, in turn, means that when one is presented with more than one algorithm, one should pick the algorithm that accesses the matrix by columns.

Our FLAME notation makes it easy to recognize algorithms that access the matrix by columns. For example, in this unit, if the algorithm accesses submatrix \(a_{01}\) or \(a_{21} \) then it accesses columns. If it accesses submatrix \(a_{10}^T \) or \(a_{12}^T \text{,}\) then it accesses the matrix by rows.

For each of these, which algorithm accesses the matrix by columns:

For \(y := U x + y \text{,}\) Trsvp_un_unb_var1 or Trsvp_un_unb_var2?\\ Does the better algorithm use a dot or an axpy?
For \(y := L x + y \text{,}\) Trsvp_ln_unb_var1 Trsvp_ln_unb_var2? \\ Does the better algorithm use a dot or an axpy?
For \(y := U^T x + y \text{,}\) Trsvp_ut_unb_var1 or Trsvp_ut_unb_var2? \\ Does the better algorithm use a dot or an axpy?
For \(y := L^T x + y \text{,}\) Trsvp_lt_unb_var1 or Trsvp_lt_unb_var2? \\ Does the better algorithm use a dot or an axpy?

Let \(U \in \Rnxn \) be an upper triangular matrix and \(x \in \Rn \) be a vector. Consider

\begin{equation*} \begin{array}{rcl} U x \amp=\amp \FlaThreeByThreeBR {U_{00}}{u_{01}}{U_{02}} {u_{10}^T}{\upsilon_{11}}{u_{12}^T} {U_{20}}{u_{21}}{U_{22}} \FlaThreeByOneB {x_0} {\chi_1} {x_2} = \left( \begin{array}{r r I r | r r} -1 \amp 2 \amp 4 \amp 1 \amp 0\\ 0 \amp 0 \amp -1 \amp -2 \amp 1 \\ \whline 0 \amp 0 \amp 3 \amp 1 \amp 2\\ \hline 0 \amp 0 \amp 0 \amp 4 \amp 3 \\ 0 \amp 0 \amp 0 \amp 0 \amp 2 \end{array} \right) \left( \begin{array}{r} 1 \\ 2 \\ \whline 3 \\ \hline 4 \\ 5 \end{array} \right) \\ \amp=\amp \left( \begin{array}{c} \star \\ \star \\ \whline \left( \begin{array}{r} 0 \\ 0 \end{array} \right)^T \left( \begin{array}{r} 1 \\ 2 \end{array} \right) + (3) ( 3 ) + \left( \begin{array}{r} 1 \\ 2 \end{array} \right)^T \left( \begin{array}{r} 4 \\ 5 \end{array} \right) \\ \hline \star \\ \star \end{array} \right) = \left( \begin{array}{c} \star \\ \star \\ \whline (3) ( 3 ) + \left( \begin{array}{r} 1 \\ 2 \end{array} \right)^T \left( \begin{array}{r} 4 \\ 5 \end{array} \right) \\ \hline \star \\ \star \end{array} \right), \end{array} \end{equation*}

where \(\star \)s indicate components of the result that aren't important in our discussion right now. We notice that \(u_{10}^T = 0 \) (a vector of two zeroes) and hence we need not compute with it.

\begin{equation*} U \rightarrow \FlaTwoByTwo{ U_{TL}}{U_{TR}}{U_{BL}}{U_{BR}} = \FlaThreeByThreeBR {U_{00}}{u_{01}}{U_{02}} {u_{10}^T}{\upsilon_{11}}{u_{12}^T} {U_{20}}{u_{21}}{U_{22}} , \end{equation*}

where \(U_{TL} \) and \(U_{00} \) are square matrices. Then

\(U_{BL} = 0 \text{,}\) \(u_{10}^T = 0 \text{,}\) \(U_{20} = 0 \text{,}\) and \(u_{21} = 0 \text{,}\) where \(0 \) indicates a matrix or vector of the appropriate dimensions.
\(U_{TL} \) and \(U_{BR} \) are upper triangular matrices.

We will just state this as "intuitively obvious". Similarly, if

\begin{equation*} L \rightarrow \FlaTwoByTwo{ L_{TL}}{L_{TR}}{L_{BL}}{L_{BR}} = \FlaThreeByThreeBR {L_{00}}{l_{01}}{L_{02}} {l_{10}^T}{\lambda_{11}}{l_{12}^T} {L_{20}}{l_{21}}{L_{22}} , \end{equation*}

where \(L_{TL} \) and \(L_{00} \) are square matrices, then

\(L_{TR} = 0 \text{,}\) \(l_{01} = 0 \text{,}\) \(L_{02} = 0 \text{,}\) and \(l_{12}^T = 0 \text{,}\) where \(0 \) indicates a matrix or vector of the appropriate dimensions.
\(L_{TL} \) and \(L_{BR} \) are lower triangular matrices.

\begin{equation*} % Define the operation to be computed \renewcommand{\routinename}{ y \becomes \mbox{Trmvp_un_unb_var1}( U, x, y )} % Step 3: Loop-guard \renewcommand{\guard}{ m( U_{TL} ) \lt m( U ) } % Step 4: Define Initialize \renewcommand{\partitionings}{ U \rightarrow \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}} , \\ x \rightarrow \FlaTwoByOne{x_{T}} {x_{B}} , y \rightarrow \FlaTwoByOne{y_{T}} {y_{B}} } \renewcommand{\partitionsizes}{ U_{TL} is 0 \times 0 , x_{T} , y_T are 0 \times 1 } % Step 5a: Repartition the operands \renewcommand{\repartitionings}{ \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}} \rightarrow \FlaThreeByThreeBR{U_{00}}{u_{01}}{U_{02}} {u_{10}^T}{\upsilon_{11}}{u_{12}^T} {U_{20}}{u_{21}}{U_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \rightarrow \FlaThreeByOneB{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \rightarrow \FlaThreeByOneB{y_0} {\psi_1} {y_2} } \renewcommand{\repartitionsizes}{ \upsilon_{11} , \chi_1 , and \psi_1 are scalars} % Step 5b: Move the double lines \renewcommand{\moveboundaries}{ \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}} \leftarrow \FlaThreeByThreeTL{U_{00}}{u_{01}}{U_{02}} {u_{10}^T}{\upsilon_{11}}{u_{12}^T} {U_{20}}{u_{21}}{A_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \leftarrow \FlaThreeByOneT{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \leftarrow \FlaThreeByOneT{y_0} {\psi_1} {y_2} } % Step 8: Insert the updates required to change the % state from that given in Step 6 to that given in Step 7 % Note: The below needs editing!!! \renewcommand{\update}{ \begin{array}{l} \phantom{y_0 \becomes \chi_1 u_{01} + y_0} \\ \psi_1 \becomes \phantom{u_{10}^T x_0 + } \upsilon_{11} \chi_1 + u_{12}^T x_2 + \psi_1 \\ \phantom{y_0 \becomes \chi_1 u_{01} + y_0} \\ \end{array} } \FlaAlgorithm \end{equation*}

\begin{equation*} % Define the operation to be computed \renewcommand{\routinename}{ y \becomes \mbox{Trmvp_un_unb_var2}( U, x, y )} % Step 3: Loop-guard \renewcommand{\guard}{ m( U_{TL} ) \lt m( U ) } % Step 4: Define Initialize \renewcommand{\partitionings}{ U \rightarrow \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}} , \\ x \rightarrow \FlaTwoByOne{x_{T}} {x_{B}} , y \rightarrow \FlaTwoByOne{y_{T}} {y_{B}} } \renewcommand{\partitionsizes}{ U_{TL} is 0 \times 0 , x_{T} , y_T are 0 \times 1 } % Step 5a: Repartition the operands \renewcommand{\repartitionings}{ \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}} \rightarrow \FlaThreeByThreeBR{U_{00}}{u_{01}}{U_{02}} {u_{10}^T}{\upsilon_{11}}{u_{12}^T} {U_{20}}{u_{21}}{U_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \rightarrow \FlaThreeByOneB{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \rightarrow \FlaThreeByOneB{y_0} {\psi_1} {y_2} } \renewcommand{\repartitionsizes}{ \upsilon_{11} , \chi_1 , and \psi_1 are scalars} % Step 5b: Move the double lines \renewcommand{\moveboundaries}{ \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}} \leftarrow \FlaThreeByThreeTL{U_{00}}{u_{01}}{U_{02}} {u_{10}^T}{\upsilon_{11}}{u_{12}^T} {U_{20}}{u_{21}}{A_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \leftarrow \FlaThreeByOneT{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \leftarrow \FlaThreeByOneT{y_0} {\psi_1} {y_2} } % Step 8: Insert the updates required to change the % state from that given in Step 6 to that given in Step 7 % Note: The below needs editing!!! \renewcommand{\update}{ \begin{array}{l} y_0 \becomes \chi_1 u_{01} + y_0 \\ \psi_1 \becomes \chi_1 \upsilon_{11} + \psi_1 \\ \phantom{y_2 \becomes \chi_1 u_{21} + y_2} \end{array} } \FlaAlgorithm \end{equation*}

Figure 4.3.2.1. Algorithms for computing \(y := U x + y \text{,}\) where \(U \) is upper triangular.

Let us start by focusing on \(y := U x + y \text{,}\) where \(U\) is upper triangular. The algorithms from the previous section can be restated as in Figure~\ref{fig-utrsv}, replacing \(A \) by \(U \text{.}\) Now, notice the parts in gray. Since \(u_{10}^T = 0 \) and \(u_{21} = 0 \text{,}\) those computations need not be performed! Bingo, we have two algorithms that take advantage of the zeroes below the diagonal. We probably should explain the names of the routines: \begin{quote} {Trmvp\_un\_unb\_var1}: \underline{Tr}iangular \underline{m}atrix-\underline{v}ector multiply \underline{p}lus (\(y \)), with \underline{u}pper triangular matrix that is \underline{n}ot transposed, \underline{unb}locked \underline{var}iant \underline{1}. \end{quote} (Yes, a bit convoluted, but such is life.)

\begin{equation*} % Define the operation to be computed \renewcommand{\routinename}{ y := \mbox{Trmvp_ln_unb_var1}( L, x, y ) } % Step 3: Loop-guard \renewcommand{\guard}{ m( L_{TL} ) \lt m( L ) } % Step 4: Define Initialize \renewcommand{\partitionings}{ L \rightarrow \FlaTwoByTwo{L_{TL}}{L_{TR}} {L_{BL}}{L_{BR}} , \\ x \rightarrow \FlaTwoByOne{x_{T}} {x_{B}} , y \rightarrow \FlaTwoByOne{y_{T}} {y_{B}} } \renewcommand{\partitionsizes}{ L_{TL} is 0 \times 0 , x_{T} , y_{T} are 0 \times 1 } % Step 5a: Repartition the operands \renewcommand{\repartitionings}{ \FlaTwoByTwo{L_{TL}}{L_{TR}} {L_{BL}}{L_{BR}} \rightarrow \FlaThreeByThreeBR{L_{00}}{l_{01}}{L_{02}} {l_{10}^T}{\lambda_{11}}{l_{12}^T} {L_{20}}{l_{21}}{L_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \rightarrow \FlaThreeByOneB{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \rightarrow \FlaThreeByOneB{y_0} {\psi_1} {y_2} } \renewcommand{\repartitionsizes}{ \lambda_{11} , \chi_1 , and \psi_1 are scalars} % Step 5b: Move the double lines \renewcommand{\moveboundaries}{ \FlaTwoByTwo{L_{TL}}{L_{TR}} {L_{BL}}{L_{BR}} \leftarrow \FlaThreeByThreeTL{L_{00}}{l_{01}}{L_{02}} {l_{10}^T}{\lambda_{11}}{l_{12}^T} {L_{20}}{l_{21}}{L_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \leftarrow \FlaThreeByOneT{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \leftarrow \FlaThreeByOneT{y_0} {\psi_1} {y_2} } % Step 8: Insert the updates required to change the % state from that given in Step 6 to that given in Step 7 % Note: The below needs editing!!! \renewcommand{\update}{ \begin{array}{l} \phantom{y_0 \becomes \chi_1 l_{01} + y_0} \\ \psi_1 \becomes l_{10}^T x_0 + \lambda_{11} \chi_1 + l_{12}^T x_2 + \psi_1 \\ \phantom{y_0 \becomes \chi_1 l_{01} + y_0} \\ \end{array} } \FlaAlgorithm \end{equation*}

\begin{equation*} % Define the operation to be computed \renewcommand{\routinename}{ y := \mbox{Trmvp\_ln\_unb\_var2}( L, x, y ) } % Step 3: Loop-guard \renewcommand{\guard}{ m( L_{TL} ) \lt m( L ) } % Step 4: Define Initialize \renewcommand{\partitionings}{ L \rightarrow \FlaTwoByTwo{L_{TL}}{L_{TR}} {L_{BL}}{L_{BR}} , \\ x \rightarrow \FlaTwoByOne{x_{T}} {x_{B}} , y \rightarrow \FlaTwoByOne{y_{T}} {y_{B}} } \renewcommand{\partitionsizes}{ L_{TL} is 0 \times 0 , x_{T} , y_{T} are 0 \times 1 } % Step 5a: Repartition the operands \renewcommand{\repartitionings}{ \FlaTwoByTwo{L_{TL}}{L_{TR}} {L_{BL}}{L_{BR}} \rightarrow \FlaThreeByThreeBR{L_{00}}{l_{01}}{L_{02}} {l_{10}^T}{\lambda_{11}}{l_{12}^T} {L_{20}}{l_{21}}{L_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \rightarrow \FlaThreeByOneB{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \rightarrow \FlaThreeByOneB{y_0} {\psi_1} {y_2} } \renewcommand{\repartitionsizes}{ \lambda_{11} , \chi_1 , and \psi_1 are scalars} % Step 5b: Move the double lines \renewcommand{\moveboundaries}{ \FlaTwoByTwo{L_{TL}}{L_{TR}} {L_{BL}}{L_{BR}} \leftarrow \FlaThreeByThreeTL{L_{00}}{l_{01}}{L_{02}} {l_{10}^T}{\lambda_{11}}{l_{12}^T} {L_{20}}{l_{21}}{L_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \leftarrow \FlaThreeByOneT{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \leftarrow \FlaThreeByOneT{y_0} {\psi_1} {y_2} } % Step 8: Insert the updates required to change the % state from that given in Step 6 to that given in Step 7 % Note: The below needs editing!!! \renewcommand{\update}{ \begin{array}{l} y_0 \becomes \chi_1 l_{01} + y_0 \\ \psi_1 \becomes \chi_1 \lambda_{11} + \psi_1 \\ y_2 \becomes \chi_1 l_{21} + y_2 \\ \end{array} } \FlaAlgorithm \end{equation*}

Figure 4.3.2.2. Algorithms to be used in Homework~\ref{hw:4322

\begin{equation*} % Define the operation to be computed \renewcommand{\routinename}{ y \becomes \mbox{Trmvp_un_unb_var2}( U, x, y )} % Step 3: Loop-guard \renewcommand{\guard}{ m( U_{TL} ) \lt m( U ) } % Step 4: Define Initialize \renewcommand{\partitionings}{ U \rightarrow \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}} , \\ x \rightarrow \FlaTwoByOne{x_{T}} {x_{B}} , y \rightarrow \FlaTwoByOne{y_{T}} {y_{B}} } \renewcommand{\partitionsizes}{ U_{TL} is 0 \times 0 , x_{T} , y_T are 0 \times 1 } % Step 5a: Repartition the operands \renewcommand{\repartitionings}{ \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}} \rightarrow \FlaThreeByThreeBR{U_{00}}{u_{01}}{U_{02}} {u_{10}^T}{\upsilon_{11}}{u_{12}^T} {U_{20}}{u_{21}}{U_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \rightarrow \FlaThreeByOneB{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \rightarrow \FlaThreeByOneB{y_0} {\psi_1} {y_2} } \renewcommand{\repartitionsizes}{ \upsilon_{11} , \chi_1 , and \psi_1 are scalars} % Step 5b: Move the double lines \renewcommand{\moveboundaries}{ \FlaTwoByTwo{U_{TL}}{U_{TR}} {U_{BL}}{U_{BR}} \leftarrow \FlaThreeByThreeTL{U_{00}}{u_{01}}{U_{02}} {u_{10}^T}{\upsilon_{11}}{u_{12}^T} {U_{20}}{u_{21}}{A_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \leftarrow \FlaThreeByOneT{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \leftarrow \FlaThreeByOneT{y_0} {\psi_1} {y_2} } % Step 8: Insert the updates required to change the % state from that given in Step 6 to that given in Step 7 % Note: The below needs editing!!! \renewcommand{\update}{ \begin{array}{l} y_0 \becomes \chi_1 u_{01} + y_0 \\ \psi_1 \becomes \chi_1 \upsilon_{11} + \psi_1 \\ \phantom{ y_2 \becomes \chi_1 u_{21} + y_2} \end{array} } \FlaAlgorithm \end{equation*}

Figure 4.3.2.3. Algorithms to be used in Homework~\ref{hw:4324

\begin{equation*} % Define the operation to be computed \renewcommand{\routinename}{ y := \mbox{Trmvp\_ln\_unb\_var1}( L, x, y ) } % Step 3: Loop-guard \renewcommand{\guard}{ m( L_{TL} ) \lt m( L ) } % Step 4: Define Initialize \renewcommand{\partitionings}{ L \rightarrow \FlaTwoByTwo{L_{TL}}{L_{TR}} {L_{BL}}{L_{BR}} , \\ x \rightarrow \FlaTwoByOne{x_{T}} {x_{B}} , y \rightarrow \FlaTwoByOne{y_{T}} {y_{B}} } \renewcommand{\partitionsizes}{ L_{TL} is 0 \times 0 , x_{T} , y_{T} are 0 \times 1 } % Step 5a: Repartition the operands \renewcommand{\repartitionings}{ \FlaTwoByTwo{L_{TL}}{L_{TR}} {L_{BL}}{L_{BR}} \rightarrow \FlaThreeByThreeBR{L_{00}}{l_{01}}{L_{02}} {l_{10}^T}{\lambda_{11}}{l_{12}^T} {L_{20}}{l_{21}}{L_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \rightarrow \FlaThreeByOneB{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \rightarrow \FlaThreeByOneB{y_0} {\psi_1} {y_2} } \renewcommand{\repartitionsizes}{ \lambda_{11} , \chi_1 , and \psi_1 are scalars} % Step 5b: Move the double lines \renewcommand{\moveboundaries}{ \FlaTwoByTwo{L_{TL}}{L_{TR}} {L_{BL}}{L_{BR}} \leftarrow \FlaThreeByThreeTL{L_{00}}{l_{01}}{L_{02}} {l_{10}^T}{\lambda_{11}}{l_{12}^T} {L_{20}}{l_{21}}{L_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \leftarrow \FlaThreeByOneT{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \leftarrow \FlaThreeByOneT{y_0} {\psi_1} {y_2} } % Step 8: Insert the updates required to change the % state from that given in Step 6 to that given in Step 7 % Note: The below needs editing!!! \renewcommand{\update}{ \begin{array}{l} \phantom{y_0 \becomes \chi_1 l_{01} + y_0} \\ \psi_1 \becomes l_{10}^T x_0 + \lambda_{11} \chi_1 + l_{12}^T x_2 + \psi_1 \\ \phantom{y_0 \becomes \chi_1 l_{01} + y_0} \\ \end{array} } \FlaAlgorithm \end{equation*}

Figure 4.3.2.4. Algorithms to be used in Homework~\ref{hw:4326

Let us analyze the algorithms for computing \(y := U x + y \text{.}\) (The analysis of all the other algorithms is very similar.) For the dot product based algorithm, the cost is in the update \(\psi_1 \becomes \upsilon_{11} \chi_1 + u_{12}^T x_2 + \psi_1 \) which is typically computed in two steps:

\(\psi_1 \becomes \upsilon_{11} \chi_1 + \psi_1 \text{;}\) followed by
a dot product \(\psi_1 \becomes u_{12}^T x_2 + \psi_1 \text{.}\)

Now, during the first iteration, \(u_{12}^T \) and \(x_2 \) are of size \(n-1 \text{,}\) so that that iteration requires \(2(n-1) + 2 = 2 n \) flops for the first step. During the \(k \)th iteration (starting with \(k = 0 \)), \(u_{12}^T \) and \(x_2 \) are of size \((n-k-1) \) so that the cost of that iteration is \(2 ( n-k ) \) flops. Thus, if \(A \) is an \(n \times n \) matrix, then the total cost is given by

\begin{equation*} \sum_{k=0}^{n-1} \left[ 2 (n-k) \right] = 2 \sum_{k=0}^{n-1} (n-k) = 2 ( n + (n-1) + \cdots + 1 ) = 2 \sum_{k=1}^{n} k = 2 (n+1) n / 2. \end{equation*}

flops. (Recall that we proved in the second week that \(\sum_{i=1}^{n} i = \frac{n(n+1)}{2} \text{.}\))