LAFF Symmetric matrix-vector multiplication

Subsection 4.3.3 Symmetric matrix-vector multiplication

Homework 4.3.3.1.

Write routines

[ y_out ] = Symv_u_unb_var1 ( A, x, y )
[ y_out ] = Symv_u_unb_var2( A, x, y )

that implement the algorithms in Figure~\ref{fig-symm_u}.

Homework 4.3.3.2.

\label{hw:4332} Modify the algorithms in Figure~\ref{fig-4332} to compute \(y \becomes A x + y \text{,}\) where \(A \) is symmetric and stored in the lower triangular part of matrix. You may want to do this in conjunction with the next exercise.

Solution

In the algorithm on the left, the update becomes \(\psi_1 := a_{10}^T x_0 + \alpha_{11} \chi_1 + a_{21}^T x_2 + \psi_1 \text{.}\) In the algorithm on the right, the first update becomes \(y_0 := \chi_1 ( a_{10}^T )^T + y_0 \text{.}\)

Homework 4.3.3.3.

Write routines

[ y_out ] = Symv_l_unb_var1 ( A, x, y )
[ y_out ] = Symv_l_unb_var2( A, x, y )

that implement the algorithms from the previous homework.

Homework 4.3.3.4.

As hinted at before: Implementations achieve better performance (finish faster) if one accesses data consecutively in memory. Now, most scientific computing codes store matrices in "column-major order" which means that the first column of a matrix is stored consecutively in memory, then the second column, and so forth. Now, this means that an algorithm that accesses a matrix by columns tends to be faster than an algorithm that accesses a matrix by rows. That, in turn, means that when one is presented with more than one algorithm, one should pick the algorithm that accesses the matrix by columns. Our FLAME notation makes it easy to recognize algorithms that access the matrix by columns.

The problem with the algorithms in this unit is that all of them access both part of a row AND part of a column. So, your challenge is to devise an algorithm for computing \(y := A x + y \) where \(A \) is symmetric and only stored in one half of the matrix that only accesses parts of columns. We will call these "variant 3". Then, write routines

[ y_out ] = Symv_u_unb_var3 ( A, x, y )
[ y_out ] = Symv_l_unb_var3( A, x, y )

Hint

Let's focus on the case where only the lower triangular part of \(A \) is stored.

Solution

If \(A \) is symmetric, then \(A = L + \widehat L^T \) where \(L \) is the lower triangular part of \(A \) and \(\widehat L \) is the strictly lower triangular part of \(A \text{.}\)
Identify an algorithm for \(y := L x + y \) that accesses matrix \(A \) by columns.
Identify an algorithm for \(y := \widehat L^T x + y \) that accesses matrix \(A \) by columns.
You now have two loops that together compute \(y := A x + y = ( L +\widehat L^T ) x + y = L x +\widehat L^T x + y \text{.}\)
Can you "merge" the loops into one loop?

\pythonversion{ You can use the IPython Notebook \\ \mbox4.3.3.4 Symmetric Matrix-Vector Multiplication Routines Challenge Question \\ to see if you got the algorithms right.

Consider

\begin{equation*} \FlaThreeByThreeBR {A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} , = \left( \begin{array}{r r I r | r r} -1 \amp 2 \amp 4 \amp 1 \amp 0\\ 2 \amp 0 \amp -1 \amp -2 \amp 1 \\ \whline 4 \amp -1 \amp 3 \amp 1 \amp 2\\ \hline 1 \amp -2 \amp 1 \amp 4 \amp 3 \\ 0 \amp 1 \amp 2 \amp 3 \amp 2 \end{array} \right). \end{equation*}

Here we purposely chose the matrix on the right to be symmetric. We notice that \(a_{10}^T = a_{01} \text{,}\) \(A_{20}^T = A_{02} \text{,}\) and \(a_{12}^T = a_{21} \text{.}\) A moment of reflection will convince you that this is a general principle, when \(A_{00} \) is square. Moreover, notice that \(A_{00}\) and \(A_{22} \) are then symmetric as well.

Consider

\begin{equation*} A = \FlaTwoByTwo{ A_{TL}}{A_{TR}}{A_{BL}}{A_{BR}} = \FlaThreeByThreeBR {A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} , \end{equation*}

where \(A_{TL} \) and \(A_{00} \) are square matrices. If \(A \) is symmetric then

\(A_{TL} \text{,}\) \(A_{BR} \text{,}\) \(A_{00} \text{,}\) and \(A_{22} \) are symmetric;
\(a_{10}^T = a_{01}^T \) and \(a_{12}^T = a_{21}^T \text{;}\) and
\(A_{20} = A_{02}^T \text{.}\)

We will just state this as "intuitively obvious".

\begin{equation*} % Define the operation to be computed \renewcommand{\routinename}{ y := \mbox{Symv_u_unb_var1}( A, x, y ) } % Step 3: Loop-guard \renewcommand{\guard}{ m( A_{TL} ) \lt m( A ) } % Step 4: Define Initialize \renewcommand{\partitionings}{ A \rightarrow \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} , \\ x \rightarrow \FlaTwoByOne{x_{T}} {x_{B}} , y \rightarrow \FlaTwoByOne{y_{T}} {y_{B}} } \renewcommand{\partitionsizes}{ A_{TL} is 0 \times 0 , x_{T} , y_T are 0 \times 1 } % Step 5a: Repartition the operands \renewcommand{\repartitionings}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \rightarrow \FlaThreeByThreeBR{A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \rightarrow \FlaThreeByOneB{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \rightarrow \FlaThreeByOneB{y_0} {\psi_1} {y_2} } \renewcommand{\repartitionsizes}{ \alpha_{11} , \chi_1 , and \psi_1 are scalars} % Step 5b: Move the double lines \renewcommand{\moveboundaries}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \leftarrow \FlaThreeByThreeTL{A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \leftarrow \FlaThreeByOneT{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \leftarrow \FlaThreeByOneT{y_0} {\psi_1} {y_2} } % Step 8: Insert the updates required to change the % state from that given in Step 6 to that given in Step 7 % Note: The below needs editing!!! \renewcommand{\update}{ \begin{array}{l} \psi_1 \becomes \begin{array}[b]{c} a_{01}^T \\ \overbrace{ \phantom{a_{10}^T} } \end{array} x_0 + \alpha_{11} \chi_1 + a_{12}^T x_2 + \psi_1 \\ ~ \end{array} } \FlaAlgorithm \end{equation*}

\begin{equation*} % Define the operation to be computed \renewcommand{\routinename}{ y := \mbox{Symv_u_unb_var2}( A, x, y ) } % Step 3: Loop-guard \renewcommand{\guard}{ m( A_{TL} ) \lt m( A ) } % Step 4: Define Initialize \renewcommand{\partitionings}{ A \rightarrow \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} , \\ x \rightarrow \FlaTwoByOne{x_{T}} {x_{B}} , y \rightarrow \FlaTwoByOne{y_{T}} {y_{B}} } \renewcommand{\partitionsizes}{ A_{TL} is 0 \times 0 , x_{T} , y_T are 0 \times 1 } % Step 5a: Repartition the operands \renewcommand{\repartitionings}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \rightarrow \FlaThreeByThreeBR{A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \rightarrow \FlaThreeByOneB{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \rightarrow \FlaThreeByOneB{y_0} {\psi_1} {y_2} } \renewcommand{\repartitionsizes}{ \alpha_{11} , \chi_1 , and \psi_1 are scalars} % Step 5b: Move the double lines \renewcommand{\moveboundaries}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \leftarrow \FlaThreeByThreeTL{A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \leftarrow \FlaThreeByOneT{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \leftarrow \FlaThreeByOneT{y_0} {\psi_1} {y_2} } % Step 8: Insert the updates required to change the % state from that given in Step 6 to that given in Step 7 % Note: The below needs editing!!! \renewcommand{\update}{ \begin{array}{l} y_0 \becomes \chi_1 a_{01} + y_0 \\ \psi_1 \becomes \chi_1 \alpha_{11} + \psi_1 \\ y_2 \becomes \chi_1 a_{12} + y_2 \end{array} } \FlaAlgorithm \end{equation*}

Figure 4.3.3.1. Algorithms for computing \(y := A x + y \) where \(A \) is symmetric, where only the upper triangular part of \(A \) is stored.

Consider computing \(y := A x + y \) where \(A \) is a symmetric matrix. Since the upper and lower triangular part of a symmetric matrix are simply the transpose of each other, it is only necessary to store half the matrix: only the upper triangular part or only the lower triangular part. In Figure~\ref{fig-symm_u} we repeat the algorithms for matrix-vector multiplication from an earlier unit, and annotate them for the case where \(A \) is symmetric and only stored in the upper triangle. The change is simple: \(a_{10} \) and \(a_{21} \) are not stored and thus

For the left algorithm, the update \(\psi_1 \becomes a_{10}^T x_0 + \alpha_{11} \chi_1 + a_{12}^T x_2 + \psi_1 \) must be changed to \(\psi_1 \becomes a_{01}^T x_0 + \alpha_{11} \chi_1 + a_{12}^T x_2 + \psi_1 \text{.}\)
For the algorithm on the right, the update \(y_2 \becomes \chi_1 a_{21} + y_2 \) must be changed to \(y_2 \becomes \chi_1 a_{12} + y_2\) (or, more precisely, \(y_2 \becomes \chi_1 ( a_{12}^T )^T + y_2 \) since \(a_{12}^T \) is the label for part of a row).

\begin{equation*} % Define the operation to be computed \renewcommand{\routinename}{ y := \mbox{Symv_l_unb_var1}( A, x, y ) } % Step 3: Loop-guard \renewcommand{\guard}{ m( A_{TL} ) \lt m( A ) } % Step 4: Define Initialize \renewcommand{\partitionings}{ A \rightarrow \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} , \\ x \rightarrow \FlaTwoByOne{x_{T}} {x_{B}} , y \rightarrow \FlaTwoByOne{y_{T}} {y_{B}} } \renewcommand{\partitionsizes}{ A_{TL} is 0 \times 0 , x_{T} , y_T are 0 \times 1 } % Step 5a: Repartition the operands \renewcommand{\repartitionings}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \rightarrow \FlaThreeByThreeBR{A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \rightarrow \FlaThreeByOneB{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \rightarrow \FlaThreeByOneB{y_0} {\psi_1} {y_2} } \renewcommand{\repartitionsizes}{ \alpha_{11} , \chi_1 , and \psi_1 are scalars} % Step 5b: Move the double lines \renewcommand{\moveboundaries}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \leftarrow \FlaThreeByThreeTL{A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \leftarrow \FlaThreeByOneT{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \leftarrow \FlaThreeByOneT{y_0} {\psi_1} {y_2} } % Step 8: Insert the updates required to change the % state from that given in Step 6 to that given in Step 7 % Note: The below needs editing!!! \renewcommand{\update}{ \begin{array}{l} ~ \\ \psi_1 \becomes a_{10}^T x_0 + \alpha_{11} \chi_1 + a_{12}^T x_2 + \psi_1 \\ ~ \end{array} } \FlaAlgorithm \end{equation*}

\begin{equation*} % Define the operation to be computed \renewcommand{\routinename}{ y := \mbox{Symv_l_unb_var2}( A, x, y ) } % Step 3: Loop-guard \renewcommand{\guard}{ m( A_{TL} ) \lt m( A ) } % Step 4: Define Initialize \renewcommand{\partitionings}{ A \rightarrow \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} , \\ x \rightarrow \FlaTwoByOne{x_{T}} {x_{B}} , y \rightarrow \FlaTwoByOne{y_{T}} {y_{B}} } \renewcommand{\partitionsizes}{ A_{TL} is 0 \times 0 , x_{T} , y_T are 0 \times 1 } % Step 5a: Repartition the operands \renewcommand{\repartitionings}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \rightarrow \FlaThreeByThreeBR{A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \rightarrow \FlaThreeByOneB{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \rightarrow \FlaThreeByOneB{y_0} {\psi_1} {y_2} } \renewcommand{\repartitionsizes}{ \alpha_{11} , \chi_1 , and \psi_1 are scalars} % Step 5b: Move the double lines \renewcommand{\moveboundaries}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \leftarrow \FlaThreeByThreeTL{A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}} , \\ \FlaTwoByOne{ x_T } { x_B } \leftarrow \FlaThreeByOneT{x_0} {\chi_1} {x_2} , \FlaTwoByOne{ y_T } { y_B } \leftarrow \FlaThreeByOneT{y_0} {\psi_1} {y_2} } % Step 8: Insert the updates required to change the % state from that given in Step 6 to that given in Step 7 % Note: The below needs editing!!! \renewcommand{\update}{ \begin{array}{l} y_0 \becomes \chi_1 a_{01} + y_0 \\ \psi_1 \becomes \chi_1 \alpha_{11} + \psi_1 \\ y_2 \becomes \chi_1 a_{21} + y_2 \end{array} } \FlaAlgorithm \end{equation*}

Figure 4.3.3.2. Algorithms for Homework~\ref{hw:4332}