Skip to main content

Unit 10.5.2 Summary

It is important to relate subspace iteration that starts with the identity matrix (below left) to a simple QR algorithm (below right):

\begin{equation*} \begin{array}{l} \widehat A^{(0)} := A \\ \widehat V^{(0)} := I \\ \widehat R^{(0)} := I \\ {\bf for~} k:=0, \ldots \\ ~~~( \widehat V^{(k+1)}, \widehat R^{(k+1)} ) := {\rm QR}( A \widehat V^{(k)} ) \\ ~~~ \widehat A^{(k+1)} := { \widehat V^{(k+1)}~}^H A \widehat V^{(k+1)} \\ {\bf endfor} \end{array} \end{equation*}

\begin{equation*} \begin{array}{l} A^{(0)} := A \phantom{\widehat A^{(0)}}\\ V^{(0)} := I \phantom{\widehat V^{(0)}}\\ R^{(0)} := I \phantom{\widehat R^{(0)}}\\ {\bf for~} k:=0, \ldots \\ ~~~( Q^{(k+1)}, R^{(k+1)} ) := {\rm QR}( A^{(k)} ) \\ ~~~ A^{(k+1)} := R^{(k+1)} Q^{(k+1)} \\ ~~~ V^{(k+1)} := V^{(k)} Q^{(k+1)} \\ {\bf endfor} \end{array} \end{equation*}

For these algorithms, the following observations hold:

  • \(A^{(k+1)} = {Q^{(k+1)}~}^H A^{(k)} Q^{(k+1)} \text{.}\)

  • \(\widehat A^{(k)} = A^{(k)}\text{.}\)

  • \(\widehat R^{(k)} = R^{(k)}\text{.}\)

  • \(\widehat V^{(k)} = V^{(k)} \text{.}\)

  • \(V^{(k)} = Q^{(0)} Q^{(1)} \cdots Q^{(k)}\text{.}\)

  • \(A^{k} = V^{(k)} R^{(k)} \cdots R^{(1)} R^{(0)}\) (Note: \(A^{k} \) here denotes \(A \) raised to the \(k \)th power.)

  • \(A^{k} = \begin{array}[t]{c} \underbrace{ Q^{(0)} Q^{(1)} \cdots Q^{(k)} } \\ \mbox{unitary } V^{(k)} \end{array} \begin{array}[t]{c} \underbrace{ R^{(k)} \cdots R^{(1)} R^{(0)} } \\ \mbox{upper triangular } \widetilde R^{(k)} \end{array}\) which exposes a QR factorization of \(A^{k} \text{.}\)

  • Partitioning \(V^{(k)} \) by columns

    \begin{equation*} V^{(k)} = \left( \begin{array}{c | c | c} v_0^{(k)} \amp \cdots \amp v_{m-1}^{(k)} \end{array} \right) \end{equation*}

    we notice that applying \(k \) iterations of the Power Method to vector \(e_0 \) yields

    \begin{equation*} A^k e_0 = V^{(k)} \widetilde R^{(k)} e_0 = V^{(k)} \widetilde \rho_{0,0}^{(k)} e_0 = \widetilde \rho_{0,0}^{(k)} V^{(k)} e_0 = \widetilde \rho_{0,0}^{(k)} v_0^{(k)} , \end{equation*}

    where \(\widetilde \rho_{0,0}^{(k)} \) is the \((0,0) \) entry in matrix \(\widetilde R^{(k)} \text{.}\) Thus, the first column of \(V^{(k)} \) equals a vector that would result from \(k \) iterations of the Power Method.

  • Similarly, the second column of \(V^{(k)} \) equals a vector that would result from \(k \) iterations of the Power Method, but orthogonal to \(v_0^{(k)} \text{.}\) And so forth for the remaining columns.

Observations:

  • \(A^{(k+1)} = Q^{(k)\,H} A^{(k)} Q^{(k)} \text{.}\) This means we can think of \(A^{(k+1)} \) as the matrix \(A^{(k)} \) but viewed in a new basis (namely the basis that consists of the column of \(Q^{(k)} \)).

  • \(A^{(k+1)} = ( Q^{(0)} \cdots Q^{(k)})^H A Q^{(0)} \cdots Q^{(k)} = V^{(k)\,H} A V^{(k)} \text{.}\) This means we can think of \(A^{(k+1)} \) as the matrix \(A \) but viewed in a new basis (namely the basis that consists of the column of \(V^{(k)} \)).

  • In each step, we compute

    \begin{equation*} ( Q^{(k+1)}, R^{(k+1)} ) = QR( A^{(k)} ) \end{equation*}

    which we can think of as

    \begin{equation*} ( Q^{(k+1)}, R^{(k+1)} )= QR( A^{(k)} \times I ) . \end{equation*}

    This suggests that in each iteration we perform one step of subspace iteration, but with matrix \(A^{(k)} \) and \(V = I \text{:}\)

    \begin{equation*} ( Q^{(k+1)}, R^{(k+1)} ) = QR( A^{(k)} V ) . \end{equation*}
  • The insight is that the QR algorithm is identical to subspace iteration, except that at each step we reorient the problem (express it in a new basis) and we restart it with \(V = I \text{.}\)

A simple shifted QR algorithm, annotated with the iteration index \(k \text{,}\) is give by

\begin{equation*} \begin{array}{l} A^{(0)} = A \\ V^{(0)} = I \\ R^{(0)} = I \\ {\bf for~} k:=0, \ldots \\ ~~~\mu_k = \alpha_{m-1,m-1}^{(k)} \\ ~~~( Q^{(k+1)}, R^{(k+1)} ) := {\rm QR}( A^{(k)} - \mu_k I ) \\ ~~~ A^{(k+1)} = R^{(k+1)} Q^{(k+1)} + \mu_k I \\ ~~~ V^{(k+1)} = V^{(k)} Q^{(k+1)} \\ {\bf endfor} \end{array} \end{equation*}

For this algorithm,

  • \(A^{(k+1)} = Q^{(k+1)\,H} A^{(k)} Q^{(k+1)} \text{.}\)

  • \(A^{(k+1)} = V^{(k+1)\,H} A V^{(k+1)} \text{.}\)

  • \(\displaystyle \begin{array}[t]{l} ( A - \mu_{k-1} I) ( A - \mu_{k-2} I) \cdots ( A - \mu_1 I) ( A - \mu_0 I) \\ ~~~~~= \begin{array}[t]{c} \underbrace{ Q^{(0)} Q^{(1)} \cdots Q^{(k)} } \\ \mbox{unitary} \end{array} \begin{array}[t]{c} \underbrace{ R^{(k)} \cdots R^{(1)} R^{(0)}. } \\ \mbox{upper triangular} \end{array} \end{array} \)

If

\begin{equation*} A = \left( \begin{array}{c | c} A_{00} \amp 0 \\ \hline 0 \amp A_{11} \end{array} \right), A_{00} x = \lambda x, \mbox{ and } A_{11} y = \mu y, \end{equation*}

then

\begin{equation*} \left( \begin{array}{c | c} A_{00} \amp 0 \\ \hline 0 \amp A_{11} \end{array} \right) \left( \begin{array}{c} x \\ \hline 0 \end{array} \right) = \lambda \left( \begin{array}{c} x \\ \hline 0 \end{array} \right) \mbox{ and } \left( \begin{array}{c | c} A_{00} \amp 0 \\ \hline 0 \amp A_{11} \end{array} \right) \left( \begin{array}{c} 0 \\ \hline y \end{array} \right) = \mu \left( \begin{array}{c} 0 \\ \hline y \end{array} \right) . \end{equation*}

Hence \(\Lambda( A ) = \Lambda( A_{00} ) \cup \Lambda( A_{11} ) \) and eigenvectors of \(A \) can be easily constructed from eigenvalues of \(A_{00} \) and \(A_{11} \text{.}\)

Let \(A \in \Cmxm \) be a Hermitian matrix and \(V \in \Cmxm \) be a unitary matrix such that

\begin{equation*} V^H A V = \left( \begin{array}{c | c} A_{00} \amp 0 \\ \hline 0 \amp A_{11} \end{array} \right) . \end{equation*}

If \(V_{00} \) and \(V_{11} \) are unitary matrices such that \(V_{00}^H A_{00} V_{00} = \Lambda_0 \) and \(V_{11}^H A_{11} V_{11} = \Lambda_1 \text{,}\) are both diagonal, then

\begin{equation*} \left( V \left( \begin{array}{c | c} V_{00} \amp 0 \\ \hline 0 \amp V_{11} \end{array} \right) \right)^H A \left( V \left( \begin{array}{c | c} V_{00} \amp 0 \\ \hline 0 \amp V_{11} \end{array} \right) \right) = \left( \begin{array}{c | c} \Lambda_{0} \amp 0 \\ \hline 0 \amp \Lambda_{1} \end{array} \right) . \end{equation*}

This observation allows one to deflate the algebraic eigenvalue problem is this special block structure is encountered.

For the simple shifted QR algorithm we typically expect convergence so that eventually

\begin{equation*} A^{(k)} = \left( \begin{array}{c | c} A_{00}^{(k)}\amp f_{01}^{(k)} \\ \hline {f_{01}^{(k)}~}^T \amp \alpha_{m-1,m-1}^{(k)} \end{array} \right) , \end{equation*}

where \(f_{01}^{(k)} \) is small. In other words,

\begin{equation*} A^{(k)} \approx \left( \begin{array}{c | c} A_{00}^{(k)}\amp 0 \\ \hline 0 \amp \alpha_{m-1,m-1}^{(k)} \end{array} \right) . \end{equation*}

Thus, once \(f_{01}^{(k)} \) is small enough, the algorithm can continue with \(A_{00}^{(k)} \text{,}\) deflating the problem into a small one.

A preprocessing step to make the shifted QR algorithm practical (for Hermitian matrices) first reduces the matrix to tridiagonal form by computing Householder transformations that are applied from the left and right, as illustrated by

\begin{equation*} \begin{array}{| c c c c c} \hline \times \amp {\color{gray} \times} \amp {\color{gray} \times} \amp {\color{gray} \times} \amp {\color{gray} \times} \\ \times \amp \times \amp {\color{gray} \times} \amp {\color{gray} \times} \amp {\color{gray} \times} \\ \times \amp \times \amp \times \amp {\color{gray} \times} \amp {\color{gray} \times} \\ \times \amp \times \amp \times \amp \times \amp {\color{gray} \times} \\ \times \amp \times \amp \times \amp \times \amp \times \end{array} \end{equation*}

\(\begin{array}{c} \\ \\ \\ \longrightarrow \end{array}\)

\begin{equation*} \begin{array}{| c c c c c} \hline \times \amp {\color{gray} \times} \amp {\color{gray} 0} \amp {\color{gray} 0} \amp {\color{gray} 0} \\ \times \amp \times \amp {\color{gray} \times} \amp {\color{gray} \times} \amp {\color{gray} \times} \\ 0 \amp \times \amp \times \amp {\color{gray} \times} \amp {\color{gray} \times} \\ 0 \amp \times \amp \times \amp \times \amp {\color{gray} \times} \\ 0 \amp \times \amp \times \amp \times \amp \times \end{array} \end{equation*}

\(\begin{array}{c} \\ \\ \\ \longrightarrow \end{array}\)

\begin{equation*} \begin{array}{c | c c c c} \times \amp {\color{gray} \times} \amp {\color{gray} 0} \amp {\color{gray} 0} \amp {\color{gray} 0} \\ \hline \times \amp \times \amp {\color{gray} \times} \amp {\color{gray} 0} \amp {\color{gray} 0} \\ 0 \amp \times \amp \times \amp {\color{gray} \times} \amp {\color{gray} \times} \\ 0 \amp 0 \amp \times \amp \times \amp {\color{gray} \times} \\ 0 \amp 0 \amp \times \amp \times \amp \times \end{array} \end{equation*}

Original matrix

First iteration

Second iteration

\(\begin{array}{c} \\ \\ \\ \longrightarrow \end{array}\)

\begin{equation*} \begin{array}{c c | c c c} \times \amp {\color{gray} \times} \amp {\color{gray} 0} \amp {\color{gray} 0} \amp {\color{gray} 0} \\ \times \amp \times \amp {\color{gray} \times} \amp {\color{gray} 0} \amp {\color{gray} 0} \\ \hline 0 \amp \times \amp \times \amp {\color{gray} \times} \amp {\color{gray} 0} \\ 0 \amp 0 \amp \times \amp \times \amp {\color{gray} \times} \\ 0 \amp 0 \amp 0 \amp \times \amp \times \end{array} \end{equation*}

\(\begin{array}{c} \\ \\ \\ \longrightarrow \end{array}\)

\begin{equation*} \begin{array}{c c c | c c} \times \amp {\color{gray} \times} \amp {\color{gray} 0} \amp {\color{gray} 0} \amp {\color{gray} 0} \\ \times \amp \times \amp {\color{gray} \times} \amp {\color{gray} 0} \amp {\color{gray} 0} \\ 0 \amp \times \amp \times \amp {\color{gray} \times} \amp {\color{gray} 0} \\ \hline 0 \amp 0 \amp \times \amp \times \amp {\color{gray} \times} \\ 0 \amp 0 \amp 0 \amp \times \amp \times \end{array} \end{equation*}

Third iteration

To maintain symmetry at each step, the Householder transformations is applied from the left and right so that the updated involves a Hermitian rank-2 update:

\begin{equation*} \begin{array}{rcl} A_{22} \amp := \amp H( a_{21} ) A_{22} H( a_{21} ) \\ \amp = \amp ( I - \frac{1}{\tau} u_{21} u_{21}^H ) A_{22} ( I - \frac{1}{\tau} u_{21} u_{21}^H ) \\ \amp = \amp ( A_{22} - \frac{1}{\tau} u_{21} \begin{array}[t]{c} \underbrace{ u_{21}^H A_{22}} \\ y_{21}^H \end{array} ) ( I - \frac{1}{\tau} u_{21} u_{21}^H ) \\ \amp = \amp A_{22} - \frac{1}{\tau} u_{21} y_{21}^H - \frac{1}{\tau} \begin{array}[t]{c} \underbrace{ A_{22} u_{21}}\\ y_{21} \end{array} u_{21}^H + \frac{1}{\tau^2} u_{21} \begin{array}[t]{c} \underbrace{ y_{21}^H u_{21} } \\ 2 \beta \end{array} u_{21}^H \\ \amp= \amp A_{22} - \left( \frac{1}{\tau} u_{21} y_{21}^H - \frac{\beta}{\tau^2}u_{21} u_{21}^H \right) - \left( \frac{1}{\tau} y_{21} u_{21}^H - \frac{\beta}{\tau^2}u_{21} u_{21}^H \right) \\ \amp= \amp A_{22} - u_{21} \begin{array}[t]{c} \underbrace{ \frac{1}{\tau} \left( y_{21}^H - \frac{\beta}{\tau} u_{21}^H \right) } \\ w_{21}^H \end{array} - \begin{array}[t]{c} \underbrace{ \frac{1}{\tau} \left( y_{21} - \frac{\beta}{\tau}u_{21} \right) } \\ w_{21} \end{array} u_{21}^H \\ \amp = \amp \begin{array}[t]{c} \underbrace{ A_{22} - u_{21} w_{21}^H - w_{21} u_{21}^H. } \\ \mbox{Hermitian}\\ \mbox{rank-2 update} \end{array} \end{array} \end{equation*}

This yields the algorithm given by

\begin{equation*} \newcommand{\routinename}{ \left[ A, t \right] := \mbox{TriRed-unb}( A,t ) } \newcommand{\guard}{ m( A_{TL} ) \lt m( A )-2 } \newcommand{\partitionings}{ A \rightarrow \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}}, t \rightarrow \FlaTwoByOne{ t_{T} }{t_B} } \newcommand{\partitionsizes}{ A_{TL} {\rm ~is~} 0 \times 0 \mbox{ and } t_T \mbox{ has } 0 \mbox{ elements} } \newcommand{\repartitionings}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \rightarrow \FlaThreeByThreeBR{A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}}, \FlaTwoByOne{t_T}{t_B} \rightarrow \FlaThreeByOneB{t_0}{\tau_1}{t_2} } \newcommand{\moveboundaries}{ \FlaTwoByTwo{A_{TL}}{A_{TR}} {A_{BL}}{A_{BR}} \leftarrow \FlaThreeByThreeTL{A_{00}}{a_{01}}{A_{02}} {a_{10}^T}{\alpha_{11}}{a_{12}^T} {A_{20}}{a_{21}}{A_{22}}, \FlaTwoByOne{t_T}{t_B} \leftarrow \FlaThreeByOneT{t_0}{\tau_1}{t_2} } \newcommand{\update}{ \begin{array}{l} [ a_{21}, \tau_1 ] := {\rm Housev1}( a_{21} ) \\ u_{21} = a_{21} \mbox{ with first element replaced with 1 } \\ \mbox{ Update } A_{22} := H( u_{21} ) A_{22} H( u_{21} ) \mbox{ via the steps } \\ ~~~~~~ \left\{ \begin{array}{ll} y_{21} := A_{22} u_{21} \amp \mbox{(Hermitian matrix-vector multiply!)}\\ \beta := u_{21}^H y_{21} / 2 \\ w_{21} := ( y_{21} - \beta u_{21} / \tau_1 ) / \tau_1 \\ A_{22} := A_{22} - \mbox{tril}(u_{21} w_{21}^H + w_{21} u_{21}^H) \amp \mbox{(Hermitian~rank-2~update)} \end{array} \right. \end{array} } \FlaAlgorithm \end{equation*}

The cost, in flops, for reducing a symmetric matrix to tridiagonal form is given by (approximately)

\begin{equation*} \frac{4}{3} m^3. \end{equation*}

(For a Hermitian matrix, this is multiplied by \(4\) since each complex flop requires \(4 \) real flops.)

A Givens' rotation is a \(2 \times 2 \) unitary matrix such that

\begin{equation*} \left( \begin{array}{c c} \gamma \amp -\sigma \\ \sigma \amp \gamma \end{array} \right)^T \begin{array}[t]{c} \underbrace{ \left( \begin{array}{c} \chi_1 \\ \chi_2 \end{array} \right) } \\ x \end{array} = \left( \begin{array}{c} \| x \|_2 \\ 0 \end{array} \right) , \end{equation*}

where \(\gamma^2 + \sigma^2 = 1 \text{.}\) In other words, \(\gamma \) and \(\sigma \) can be thought of as the cosine and sine of an angle \(\theta \text{.}\)

The Francis Implicit Q step is one of the most elegant insights in numerical linear algebra. It cannot be succinctly summarized. Hence, one should carefully internalize Unit 10.3.5. In particular, it is important to understand the "chasing of the bulge" captured by Unit 10.3.5. How this fits into a complete implicitly shifted QR algorithm for computing the eigenvalues and eigenvectors of a Hermitian matrix is discussed in Unit 10.3.6.