LAFF Matrix-vector multiplication, again

Subsection 4.2.3 Matrix-vector multiplication, again

Homework 4.2.3.1.

Implement routines

[ y_out ] = Mvmult_n_unb_var1B( A, x, y )}

[ y_out ] =Mvmult_n_unb_var2B( A, x, y )

that compute \(y := A x + y \) via the algorithms in Figure~\ref{fig-mvmult_alt}.

In the next few units, we will modify the matrix-vector multiplication algorithms from last week so that they can take advantage of matrices with special structure (e.g., triangular or symmetric matrices).

\begin{equation*} \renewcommand{\routinename}{ \left[ A \right] := \mbox{Algorithm_Skeleton}( A ) } % Step 3: Loop-guard \renewcommand{\guard}{ m( A_{TL} ) \lt m( A ) } % Step 4: Define Initialize \renewcommand{\partitionings}{ A \rightarrow \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} } \renewcommand{\partitionsizes}{ A_{TL} is 0 \times 0 } % Step 5a: Repartition the operands \renewcommand{\repartitionings}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \rightarrow \FlaThreeByThreeBR{A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} } \renewcommand{\repartitionsizes}{ \alpha_{11} is 1 \times 1 } % Step 5b: Move the double lines \renewcommand{\moveboundaries}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \leftarrow \FlaThreeByThreeTL{A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} } % Step 8: Insert the updates required to change the % state from that given in Step 6 to that given in Step 7 % Note: The below needs editing!!! \renewcommand{\update}{ \begin{array}{l @{\hspace{1in}}} ~~\\ ~~\\ ~~ \end{array} } \FlaAlgorithm \end{equation*}

Figure 4.2.3.1. Code skeleton for algorithms when matrices are triangular or symmetric.

What makes a triangular or symmetric matrix special? For one thing, it is square. For another, it only requires one triangle of a matrix to be stored. It was for this reason that we ended up with "algorithm skeletons" that looked like the one in Figure~\ref{fig-skel1} when we presented algorithms for "triangularizing" or "symmetrizing" a matrix.

Now, consider a typical partitioning of a matrix that is encountered in such an algorithm:

\begin{equation*} \FlaThreeByThreeBR { A_{00} }{ a_{01} }{ A_{02} } { a_{10}^T }{ \alpha_{01} }{ a_{12}^T } { A_{20} }{ a_{21} }{ A_{22} } = \left( \begin{array}{c c I c | c c c} \times \amp \times \amp \times \amp \times \amp \times \amp \times \\ \times \amp \times \amp \times \amp \times \amp \times \amp \times \\ \whline \times \amp \times \amp \times \amp \times \amp \times \amp \times \\ \hline \times \amp \times \amp \times \amp \times \amp \times \amp \times \\ \times \amp \times \amp \times \amp \times \amp \times \amp \times \\ \times \amp \times \amp \times \amp \times \amp \times \amp \times \\ \end{array} \right), \end{equation*}

where each \(\times \) represents an entry in the matrix (in this case \(6 \times 6 \)). If, for example, the matrix is lower triangular,

\begin{equation*} \FlaThreeByThreeBR { A_{00} }{ a_{01} }{ A_{02} } { a_{10}^T }{ \alpha_{11} }{ a_{12}^T } { A_{20} }{ a_{21} }{ A_{22} } = \left( \begin{array}{c c I c | c c c} \times \amp 0 \amp 0 \amp 0 \amp 0 \amp 0 \\ \times \amp \times \amp 0 \amp 0 \amp 0 \amp 0 \\ \whline \times \amp \times \amp \times \amp 0 \amp 0 \amp 0 \\ \hline \times \amp \times \amp \times \amp \times \amp 0 \amp 0 \\ \times \amp \times \amp \times \amp \times \amp \times \amp 0 \\ \times \amp \times \amp \times \amp \times \amp \times \amp \times \\ \end{array} \right), \end{equation*}

then \(a_{01} = 0 \text{,}\) \(A_{02} = 0 \text{,}\) and \(a_{12}^T = 0\text{.}\) (Remember: the "\(0\)" is a matrix or vector "of appropriate size".) If instead the matrix is symmetric with only the lower triangular part stored, then \(a_{01} = \left( a_{10}^T \right)^T = a_{10} \text{,}\) \(A_{02} = A_{20}^T \text{,}\) and \(a_{12}^T = a_{21}^T \text{.}\)

\begin{equation*} % Define the operation to be computed \renewcommand{\routinename}{ y := \mbox{Mvmult\_n\_unb\_var1b}( A, x, y ) } % Step 3: Loop-guard \renewcommand{\guard}{ m( A_{TL} ) \lt m( A ) } % Step 4: Define Initialize \renewcommand{\partitionings}{ A \rightarrow \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} , \\ x \rightarrow \FlaTwoByOne{x_{T}} {x_{B}} , y \rightarrow \FlaTwoByOne{y_{T}} {y_{B}} } \renewcommand{\partitionsizes}{ A_{TL} is 0 \times 0 , x_{T} , y_T are 0 \times 1 } % Step 5a: Repartition the operands \renewcommand{\repartitionings}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \rightarrow \FlaThreeByThreeBR{A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \rightarrow \FlaThreeByOneB{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \rightarrow \FlaThreeByOneB{y_0} {\psi_1} {y_2} } \renewcommand{\repartitionsizes}{ \alpha_{11} , \chi_1 , and \psi_1 are scalars} % Step 5b: Move the double lines \renewcommand{\moveboundaries}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \leftarrow \FlaThreeByThreeTL{A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \leftarrow \FlaThreeByOneT{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \leftarrow \FlaThreeByOneT{y_0} {\psi_1} {y_2} } % Step 8: Insert the updates required to change the % state from that given in Step 6 to that given in Step 7 % Note: The below needs editing!!! \renewcommand{\update}{ \begin{array}{l} \psi_1 \becomes a_{10}^T x_0 + \alpha_{11} \chi_1 + a_{12}^T x_2 + \psi_1 \phantom{ \FlaThreeByOneB{ y_0 }{\psi_1}{y_2} } \end{array} } \FlaAlgorithm \end{equation*}

\begin{equation*} % Define the operation to be computed \renewcommand{\routinename}{ y := \mbox{Mvmult\_n\_unb\_var2b}( A, x, y ) } % Step 3: Loop-guard \renewcommand{\guard}{ m( A_{TL} ) \lt m( A ) } % Step 4: Define Initialize \renewcommand{\partitionings}{ A \rightarrow \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} , \\ x \rightarrow \FlaTwoByOne{x_{T}} {x_{B}} , y \rightarrow \FlaTwoByOne{y_{T}} {y_{B}} } \renewcommand{\partitionsizes}{ A_{TL} is 0 \times 0 , x_{T} , y_T are 0 \times 1 } % Step 5a: Repartition the operands \renewcommand{\repartitionings}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \rightarrow \FlaThreeByThreeBR{A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \rightarrow \FlaThreeByOneB{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \rightarrow \FlaThreeByOneB{y_0} {\psi_1} {y_2} } \renewcommand{\repartitionsizes}{ \alpha_{11} , \chi_1 , and \psi_1 are scalars} % Step 5b: Move the double lines \renewcommand{\moveboundaries}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \leftarrow \FlaThreeByThreeTL{A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \leftarrow \FlaThreeByOneT{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \leftarrow \FlaThreeByOneT{y_0} {\psi_1} {y_2} } % Step 8: Insert the updates required to change the % state from that given in Step 6 to that given in Step 7 % Note: The below needs editing!!! \renewcommand{\update}{ \begin{array}{l} y_0 \becomes \chi_1 a_{01} + y_0 \\ \psi_1 \becomes \chi_1 \alpha_{11} + \psi_1 \\ y_2 \becomes \chi_1 a_{21} + y_2 \end{array} } \FlaAlgorithm \end{equation*}

Figure 4.2.3.2. Alternative algorithms for matrix-vector multiplication.

The above observation leads us to express the matrix-vector multiplication algorithms for computing \(y := A x + y \) given in Figure~\ref{fig-mvmult_alt}. Note:

For the left algorithm, what was previously the "current" row in matrix \(A \text{,}\) \(a_{1}^T \text{,}\) is now viewed as consisting of three parts:

\begin{equation*} a_{1}^T = \left( \begin{array}{c I c | c} a_{10}^T \amp \alpha_{11} \amp a_{12}^T \end{array} \right) \end{equation*}

while the vector \(x \) is now also partitioned into three parts:

\begin{equation*} x = \left( \begin{array}{c} x_0 \\ \whline \chi_1 \\ \hline x_1 \end{array} \right). \end{equation*}

As we saw in the first week, the partitioned dot product becomes

\begin{equation*} a_1^T x = \left( \begin{array}{c I c | c} a_{10}^T \amp \alpha_{11} \amp a_{12}^T \end{array} \right) \left( \begin{array}{c} x_0 \\ \whline \chi_1 \\ \hline x_1 \end{array} \right) = a_{10}^T x_0 + \alpha_{11} \chi_1 + a_{12}^T x_2, \end{equation*}

which explains why the update

\begin{equation*} \psi_1 := a_1^T x + \psi_1 \end{equation*}

is now

\begin{equation*} \psi_1 := a_{10}^T x_0 + \alpha_{11} \chi_1 + a_{12}^T x_2 + \psi_1. \end{equation*}
Similar, for the algorithm on the right, based on the matrix-vector multiplication algorithm that uses the {axpy} operations, we note that

\begin{equation*} y \becomes \chi_1 a_1 + y \end{equation*}

is replaced by

\begin{equation*} \left( \begin{array}{c} y_0 \\ \whline \psi_1 \\ \hline y_2 \end{array} \right) := \chi_1 \left( \begin{array}{c} a_{01} \\ \whline \alpha_{11} \\ \hline a_{21} \end{array} \right) + \left( \begin{array}{c} y_0 \\ \whline \psi_1 \\ \hline y_2 \end{array} \right) \end{equation*}

which equals

\begin{equation*} \left( \begin{array}{c} y_0 \\ \whline \psi_1 \\ \hline y_2 \end{array} \right) := \left( \begin{array}{c} \chi_1 a_{01} + y_0 \\ \whline \chi_1 \alpha_{11} + \psi_1 \\ \hline \chi_1 a_{21} + y_2 \end{array} \right) . \end{equation*}

This explains the update

\begin{equation*} \begin{array}{r c l} y_0 \amp := \amp \chi_1 a_{01} + y_0 \\ \psi_1 \amp := \amp \chi_1 \alpha_{11} + \psi_1 \\ y_2 \amp := \amp \chi_1 a_{21} + y_2. \end{array} \end{equation*}

Now, for matrix-vector multiplication \(y := A x + y \text{,}\) it is not beneficial to break the computation up in this way. Typically, a dot product is more efficient than multiple operations with the subvectors. Similarly, typically one \axpy\ is more efficient then multiple \axpy s. But the observations in this unit lay the foundation for modifying the algorithms to take advantage of special structure in the matrix, later this week.