Skip to main content

Subsection 3.2.4 Modified Gram-Schmidt (MGS)

In the video, we reasoned that the following two algorithms compute the same values, except that the columns of \(Q \) overwrite the corresponding columns of \(A \text{:}\)

\(\begin{array}{l} {\bf for~} j=0, \ldots, n-1 \\ ~~~ a_j^\perp := a_j \\ ~~~ {\bf for~} k=0, \dots, j-1 \\ ~~~ ~~~ \rho_{k,j} := q_k^H a_j^\perp \\ ~~~ ~~~ a_j^\perp := a_j^\perp - \rho_{k,j} q_k \\ ~~~ {\bf end} \\ ~~~ \rho_{j,j} := \| {a}_j^\perp \|_2 \\ ~~~ q_j := a_j^\perp / \rho_{j,j} \\ {\bf end} \end{array}\)

\(~\)

\(\begin{array}{l} {\bf for~} j=0, \ldots, n-1 \\ ~~\\ ~~~ {\bf for~} k=0, \dots, j-1 \\ ~~~ ~~~ \rho_{k,j} := a_k^H a_j \\ ~~~ ~~~ a_j := a_j - \rho_{k,j} a_k \\ ~~~ {\bf end} \\ ~~~ \rho_{j,j} := \| {a}_j \|_2 \\ ~~~ a_j := a_j / \rho_{j,j} \\ {\bf end} \end{array}\)

(a) MGS algorithm that computes \(Q \) and \(R \) from \(A \text{.}\)

\(~\)

(b) MGS algorithm that computes \(Q \) and \(R \) from \(A \text{,}\) overwriting \(A \) with \(Q\text{.}\)

Homework 3.2.4.1.

Assume that \(q_0, \ldots, q_{k-1} \) are mutually orthonormal. Let \(\rho_{j,k} = q_j^H y \) for \(j = 0, \ldots, i-1 \text{.}\) Show that

\begin{equation*} \begin{array}[t]{c} \underbrace{ q_i^H y } \\ \rho_{i,k} \end{array} = q_i^H \left( y - \rho_{0,k} q_0 - \cdots - \rho_{i-1,k} q_{i-1} \right) \end{equation*}

for \(i = 0, \ldots, k-1 \text{.}\)

Solution
\begin{equation*} \begin{array}{l} q_i^H \left( y - \rho_{0,k} q_0 - \cdots - \rho_{i-1,k} q_{i-1} \right) \\ ~~~=~~~~ \lt \mbox{ distribute } \gt \\ q_i^H y - q_i^H \rho_{0,k} q_0 - \cdots - q_i^H \rho_{i-1,k} q_{i-1} \\ ~~~=~~~~ \lt \rho_{0,k} \mbox{ is a scalar } \gt \\ q_i^H y - \rho_{0,k} \begin{array}[t]{c} \underbrace{ q_i^H q_0} \\ 0 \end{array} - \cdots - \begin{array}[t]{c} \underbrace{ \rho_{i-1,k} q_i^H q_{i-1} } \\ 0 \end{array} \end{array} \end{equation*}

This homework illustrates how, given a vector \(y \in \Cm \) and a matrix \(Q \in \C^{mxk} \) the component orthogonal to the column space of \(Q \text{,}\) given by \(( I - Q Q^H ) y \text{,}\) can be computed by either of the two algorithms given in Figure 3.2.4.1. The one on the left, \(\mbox{Proj}\perp Q_{\rm CGS}( Q, y ) \) projects \(y\) onto the column space perpendicular to \(Q \) as did the Gram-Schmidt algorithm with which we started. The one on the left successfuly subtracts out the component in the direction of \(q_i \) using a vector that has been updated in previous iterations (and hence is already orthogonal to \(q_0, \ldots, q_{i-1} \)). The algorithm on the right is one variant of the Modified Gram-Schmidt (MGS) algorithm.

\begin{equation*} \begin{array}{|l | l|} \hline [ y^\perp, r ] = {\rm Proj\!\!\perp\!\! Q}_{\rm CGS}( Q, y ) ~~~~~~~~~~~~ \amp [ y^\perp, r ] = {\rm Proj\!\!\perp\!\! Q}_{\rm MGS}( Q, y ) ~~~~~~~~~~~~ \\ {\rm (used~by~CGS)} \amp {\rm (used~by~MGS)} \\ \hline y^\perp = y \amp y^\perp = y \\ {\bf for~} i=0, \ldots, k-1 \amp {\bf for~} i=0, \ldots, k-1 \\ ~~~\rho_i := q_i^H y \amp ~~~ \rho_i := q_i^H y^\perp \\ ~~~ y^\perp := y^\perp - \rho_i q_i \amp ~~~ y^\perp := y^\perp - \rho_i q_i \\ {\bf endfor} \amp {\bf endfor} \\ \hline \end{array} \end{equation*}
Figure 3.2.4.1. Two different ways of computing \(y^\perp = ( I - Q Q^H ) y = y - Q r \text{,}\) where \(r = Q^H y \text{.}\) The computed \(y^\perp \) is the component of \(y \) orthogonal to \({\cal C}(Q) \text{,}\) where \(Q \) has \(k \) orthonormal columns. (Notice the \(y \) on the left versus the \(y^\perp \) on the right in the computation of \(\rho_i \text{.}\))

These insights allow us to present CGS and this variant of MGS in FLAME notation, in Figure 3.2.4.2 (left and middle).

\begin{equation*} \newcommand{\operation}{ \left[Q, R \right] := \mbox{Gram-Schmidt}( A ) } \newcommand{\routinename}{ \left[ A, R \right] := \mbox{GS}( A ) \quad \mbox{ (overwrites $ A $ with $ Q $)}} % Step 3: Loop-guard \newcommand{\guard}{ n( A_L ) \lt n( A ) } % Step 4: Initialize \newcommand{\partitionings}{ A \rightarrow \FlaOneByTwo{A_L}{A_R} , R \rightarrow \FlaTwoByTwo{R_{TL}}{R_{TR}} {0}{R_{BR}} } \newcommand{\partitionsizes}{ A_L {\rm ~has~} 0 {\rm ~columns~and~} R_{TL} {\rm ~is~} 0 \times 0 } % Step 5a: Repartition the operands \newcommand{\repartitionings}{ \FlaOneByTwo{A_L}{A_R} \rightarrow \FlaOneByThreeR{A_0}{a_1}{A_2} , \FlaTwoByTwo{R_{TL}}{R_{TR}} {0}{R_{BR}} \rightarrow \FlaThreeByThreeBR{R_{00}}{r_{01}}{R_{02}} {0}{\rho_{11}}{r_{12}^T} {0}{0}{R_{22}} } % \newcommand{\repartitionsizes}{ % a_1 {\rm ~and~} q_1 {\rm ~are~columns,~} % \rho_{11} {\rm is~a~scalar} % } % Step 5b: Move the double lines \newcommand{\moveboundaries}{ \FlaOneByTwo{A_L}{A_R} \leftarrow \FlaOneByThreeL{A_0}{a_1}{A_2} , \FlaTwoByTwo{R_{TL}}{R_{TR}} {0}{R_{BR}} \leftarrow \FlaThreeByThreeTL{R_{00}}{r_{01}}{R_{02}} {0}{\rho_{11}}{r_{12}^T} {0}{0}{R_{22}} } % Step 8: Insert the updates required to change the % state from that given in Step 6 to that given in Step 7 % Note: The below needs editing!!! \newcommand{\update}{ \begin{array}{l | l | l} \hline {\rm CGS} \amp {\rm MGS} \amp {\rm MGS~(alternative)} \\ r_{01} := A_0^H a_1 \amp \amp \\ a_1 := a_1 - A_0 r_{01} \amp [ a_1, r_{01} ] = {\rm Proj\!\!\perp\!\! toQ}_{\rm MGS}( A_0, a_1 ) \amp \\ \rho_{11} := \| a_1 \|_2 \amp \rho_{11} := \| a_1 \|_2 \amp \rho_{11} := \| a_1 \|_2 \\ a_1 := a_1 / \rho_{11} \amp a_1 := a_1 / \rho_{11} \amp a_1 := a_1 / \rho_{11} \\ \amp \amp r_{12}^T := a_1^H A_2 \\ \amp \amp A_2 := A_2 - a_1 r_{12}^T \\ \hline \end{array} } \FlaAlgorithm % this command generates the worksheet using the % commands that were defined between the \resetsteps % and the worksheet command \end{equation*}
Figure 3.2.4.2. Left: Classical Gram-Schmidt algorithm. Middle: Modified Gram-Schmidt algorithm. Right: Alternative Modified Gram-Schmidt algorithm. In this last algorithm, every time a new column, \(q_1 \text{,}\) of \(Q \) is computed, each column of \(A_2 \) is updated so that its component in the direction of \(q_1 \) is is subtracted out. This means that at the start and finish of the current iteration, the columns of \(A_L \) are mutually orthonormal and the columns of \(A_R \) are orthogonal to the columns of \(A_L \text{.}\)

Next, we massage the MGS algorithm into the alternative MGS algorithmic variant given in Figure 3.2.4.2 (right).

The video discusses how MGS can be rearranged so that every time a new vector \(q_k \) is computed (overwriting \(a_k \)), the remaining vectors, \(\{ a_{k+1}, \ldots, a_{n-1} \} \text{,}\) can be updated by subtracting out the component in the direction of \(q_k \text{.}\) This is also illustrated through the next sequence of equivalent algorithms.

\(\begin{array}{l} {\bf for~} j=0, \ldots, n-1 \\ ~~~ \rho_{j,j} := \| {a}_j \|_2 \\ ~~~ a_j := a_j / \rho_{j,j} \\ ~~~ {\bf for~} k=j+1, \dots, n-1 \\ ~~~ ~~~ \rho_{j,k} := a_j^H a_k \\ ~~~ ~~~ a_k := a_k - \rho_{j,k} a_j \\ ~~~ {\bf end} \\ {\bf end} \end{array}\)

\(~\)

\(\begin{array}{l} {\bf for~} j=0, \ldots, n-1 \\ ~~~ \rho_{j,j} := \| {a}_j \|_2 \\ ~~~ a_j := a_j / \rho_{j,j} \\ ~~~ {\bf for~} k=j+1, \dots, n-1 \\ ~~~ ~~~ \rho_{j,k} := a_j^H a_k \\ ~~~ {\bf end} \\ ~~~ {\bf for~} k=j+1, \dots, n-1 \\ ~~~ ~~~ a_k := a_k - \rho_{j,k} a_j \\ ~~~ {\bf end} \\ {\bf end} \end{array}\)

(c) MGS algorithm that normalizes the \(j \)th column to have unit length to compute \(q_j \) (overwriting \(a_j \) with the result) and then subtracts the component in the direction of \(q_j \) off the rest of the columns \((a_{j+1}, \ldots , a_{n-1})\text{.}\)

\(~\)

(d) Slight modification of the algorithm in (c) that computes \(\rho_{j,k} \) in a separate loop.

\(\begin{array}{l} {\bf for~} j=0, \ldots, n-1 \\ ~~~ \rho_{j,j} := \| {a}_j \|_2 \\ ~~~ a_j := a_j / \rho_{j,j} \\ ~~~ ~~~ \left( \begin{array}{c | c | c} \rho_{j,j+1} \amp \cdots \amp \rho_{j,n-1} \end{array} \right) := \\ ~~~ ~~~ a_j^H \left( \begin{array}{c | c | c} a_{j+1} \amp \cdots \amp a_{n-1} \end{array} \right) \\ ~~~ \left( \begin{array}{c | c | c} a_{j+1} \amp \cdots \amp a_{n-1} \end{array} \right) := \\ ~~~ ~~~ \left( \begin{array}{c | c | c} a_{j+1} - \rho_{j,j+1} a_j \amp \cdots \amp a_{n-1} - \rho_{j,n-1} a_j \end{array} \right) \\ {\bf end} \end{array}\)

\(~\)

\(\begin{array}{l} {\bf for~} j=0, \ldots, n-1 \\ ~~~ \rho_{j,j} := \| {a}_j \|_2 \\ ~~~ a_j := a_j / \rho_{j,j} \\ ~~~ \left( \begin{array}{c | c | c} \rho_{j,j+1} \amp \cdots \amp \rho_{j,n-1} \end{array} \right) := \\ ~~~ ~~~ a_j^H \left( \begin{array}{c | c | c} a_{j+1} \amp \cdots \amp a_{n-1} \end{array} \right) \\ ~~~ \left( \begin{array}{c | c | c} a_{j+1} \amp \cdots \amp a_{n-1} \end{array} \right):= \\ ~~~ ~~~ \left( \begin{array}{c | c | c} a_{j+1} \amp \cdots \amp a_{n-1} \end{array} \right) \\ ~~~ ~~~ ~~~ - a_j \left( \begin{array}{c | c | c} \rho_{j,j+1} \amp \cdots \amp \rho_{j,n-1} \end{array} \right) \\ {\bf end} \end{array}\)

(e) Algorithm in (d) rewritten to expose only the outer loop.

\(~\)

(f) Algorithm in (e) rewritten to expose the row-vector-times matrix multiplication \(a_j^H \left( \begin{array}{c | c | c } a_{j+1} \amp \cdots \amp a_{n-1} \end{array} \right) \) and rank-1 update \(\left( \begin{array}{c | c | c } a_{j+1} \amp \cdots \amp a_{n-1} \end{array} \right) - a_j \left( \begin{array}{c | c | c } \rho_{j,j+1} \amp \cdots \amp \rho_{j,n-1} \end{array} \right) \text{.}\)

Figure 3.2.4.3. Various equivalent MGS algorithms.

This discussion shows that the updating of future columns by subtracting out the component in the direction of the latest column of \(Q \) to be computed can be cast in terms of a rank-1 update. This is also captured, using FLAME notation, in the algorithm in Figure 3.2.4.2, as is further illustrated in Figure 3.2.4.4:

Figure 3.2.4.4. Alternative Modified Gram-Schmidt algorithm for computing the QR factorization of a matrix \(A \text{.}\)

Ponder This 3.2.4.2.

Let \(A \) have linearly independent columns and let \(A = Q R \) be a QR factorization of \(A \text{.}\) Partition

\begin{equation*} A \rightarrow \FlaOneByTwo{A_L}{A_R}, \quad Q \rightarrow \FlaOneByTwo{Q_L}{Q_R}, {\rm ~~~and~~~} R \rightarrow \FlaTwoByTwo {R_{TL}}{R_{TR}} { 0 }{R_{BR}}, \end{equation*}

where \(A_L \) and \(Q_L \) have \(k \) columns and \(R_{TL} \) is \(k \times k \text{.}\)

As you prove the following insights, relate each to the algorithm in Figure 3.2.4.4. In particular, at the top of the loop of a typical iteration, how have the different parts of \(A \) and \(R \) been updated?

  1. \(A_L = Q_L R_{TL}\text{.}\)

    (\(Q_L R_{TL} \) equals the QR factorization of \(A_L \text{.}\))

  2. \({\cal C}( A_L ) = {\cal C}( Q_L ) \text{.}\)

    (The first \(k\) columns of \(Q \) form an orthonormal basis for the space spanned by the first \(k \) columns of \(A \text{.}\))

  3. \(R_{TR} = Q_L^H A_R\text{.}\)

  4. \((A_R - Q_L R_{TR} )^H Q_L = 0 \text{.}\)

    (Each column in \(A_R - Q_L R_{TR}\) equals the component of the corresponding column of \(A_R \) that is orthogonal to \(\Span( Q_L ) \text{.}\))

  5. \({\cal C}( A_R - Q_L R_{TR} ) = {\cal C}( Q_R ) \text{.}\)

  6. \(A_R - Q_L R_{TR} = Q_R R_{BR} \text{.}\)

    (The columns of \(Q_R \) form an orthonormal basis for the column space of \(A_R - Q_L R_{TR} \text{.}\))

Solution

Consider the fact that \(A = Q R \text{.}\) Then, multiplying the partitioned matrices,

\begin{equation*} \begin{array}{rcl} \FlaOneByTwo{A_L}{A_R} \amp = \amp \FlaOneByTwo{Q_L}{Q_R} \FlaTwoByTwo {R_{TL}}{R_{TR}} { 0 }{R_{BR}} \\ \amp = \amp \FlaOneByTwo{Q_L R_{TL}}{Q_L R_{TR} + Q_R R_{BR}}. \end{array} \end{equation*}

Hence

\begin{equation} A_L = Q_L R_{TL} \quad \mbox{and} \quad A_R = Q_L R_{TR} + Q_R R_{BR}.\label{MGS-eqn-1}\tag{3.2.2} \end{equation}
  1. The left equality in (3.2.2) answers 1.

  2. \({\cal C}( A_L ) = {\cal C}( Q_L ) \) can be shown by noting that \(R \) is upper triangular and nonsingular and hence \(R_{TL} \) is upper triangular and nonsingular, and using this to show that \({\cal C}( A_L ) \subset {\cal C}( Q_L ) \) and \({\cal C}( Q_L ) \subset {\cal C}( A_L ) \text{:}\)

    • \({\cal C}( A_L ) \subset {\cal C}( Q_L ) \text{:}\) Let \(y \in {\cal C}( A_L ) \text{.}\) Then there exists \(x \) such that \(A_L x = y \text{.}\) But then \(Q_L R_{TL} x = y \) and hence \(Q_L( R_{TL} x ) = y \) which means that \(y \in {\cal C}( Q_L ) \text{.}\)

    • \({\cal C}( Q_L ) \subset {\cal C}( A_L ) \text{:}\) Let \(y \in {\cal C}( Q_L ) \text{.}\) Then there exists \(x \) such that \(Q_L x = y \text{.}\) But then \(A_L R_{TL}^{-1} x = y \) and hence \(A_L( R_{TL}^{-1} x ) = y \) which means that \(y \in {\cal C}( A_L ) \text{.}\)

    This answers 2.

  3. Take \(A_R - Q_L R_{TR} = Q_R R_{BR} \) and multiply both side by \(Q_L^H \text{:}\)

    \begin{equation*} Q_L^H( A_R - Q_L R_{TR} )= Q_L^H Q_R R_{BR} \end{equation*}

    is equivalent to

    \begin{equation*} Q_L^H A_R - \begin{array}[t]{c} \underbrace{ Q_L^H Q_L } \\ I \end{array} R_{TR} = \begin{array}[t]{c} \underbrace{ Q_L^H Q_R } \\ 0 \end{array} R_{BR} = 0. \end{equation*}

    Rearranging yields 3.

  4. Since \(A_R - Q_L R_{TR} = Q_R R_{BR} \) we find that \(( A_R - Q_L R_{TR} )^H Q_L = ( Q_R R_{BR} )^H Q_L \) and

    \begin{equation*} ( A_R - Q_L R_{TR} )^H Q_L = R_{BR}^H Q_R^H Q_L = 0. \end{equation*}
  5. Similar to the proof of 2.

  6. Rearranging the right equality in (3.2.2) yields \(A_R - Q_L R_{TR} = Q_R R_{BR}. \text{,}\) which answers 5.

  7. Letting \(\widehat A \) denote the original contents of \(A \text{,}\) at a typical point,

    • \(A_L \) has been updated with \(Q_L \text{.}\)

    • \(R_{TL} \) and \(R_{TR} \) have been computed.

    • \(A_R = \widehat A_R - Q_L R_{TR} \text{.}\)

Homework 3.2.4.3.

Implement the algorithm in Figure 3.2.4.4 as

function [ Aout, Rout ] = MGS_QR( A, R )
Input is an \(m \times n \) matrix A and a \(n \times n \) matrix \(R \text{.}\) Output is the matrix \(Q \text{,}\) which has overwritten matrix A, and the upper triangular matrix \(R \text{.}\) (The values below the diagonal can be arbitrary.) You may want to use Assignments/Week03/matlab/test_MGS_QR.m to check your implementation.