Skip to main content

Subsection 10.2.1 Finding more eigenvectors and eigenvalues

The idea behind subspace iteration is to perform the Power Method with more than one vector in order to converge to (a subspace spanned by) the eigenvectors associated with a set of eigenvalues.

We start our discussion by restricting ourselves to the case where \(A \) is Hermitian. Why? Because the eigenvectors associated with distinct eigenvalues of a Hermitian matrix are mutually orthogonal, which will simplify our discussion.

Recall that when we analyzed the convergence of the Power Method, we commented on the fact that the method converges to an eigenvector associated with the largest eigenvalue if two conditions are met:

  • \(\vert \lambda_0 \vert \gt \vert \lambda_1 \vert \text{.}\)

  • \(v^{(0)} \) has a component in the direction of the eigenvector, \(x_0 \text{,}\) associated with \(\lambda_0 \text{.}\)

The second condition is not that important for two reasons: If we start with a random vector \(v^{(0)} \) it almost surely has a component in the direction of \(x_0 \) and even if it doesn't, roundoff error will introduce such a component over time.

Notice that \(v^{(0)} \) does not have a component in the direction of \(x_0 \) if it is orthogonal to \(x_0 \text{.}\) So, if we know \(x_0 \text{,}\) then we can pick a random vector and subtract out the component in the direction of \(x_0 \text{,}\) and make this our vector \(v^{(0)} \text{.}\) If we then start the Power Method with this new vector (and don't introduce roundoff error in a way that introduces a component in the direction of \(x_0 \)), then the iteration will home in on a vector associated with \(\lambda_1 \) (provided \(A \) is Hermitian, \(\vert \lambda_1 \vert \gt \vert \lambda_2 \vert \text{,}\) and \(v^{(0)} \) has a component in the direction of \(x_1 \text{.}\)) This iteration would look like

\begin{equation*} \begin{array}{l} v_1 := \mbox{ random vector} \\ v_1^\perp := v_1 - x_0^Hv_1 x_0 \\ v_1^{(0)} := v_1^\perp / \| v_1^\perp \|_2 \\ {\bf for~} k:=0, \ldots \\ ~~~ v_1 := A v_1^{(k)} \\ ~~~ v_1^{(k+1)} := v_1 / \| v_1 \|_2 \\ {\bf endfor} \end{array} \end{equation*}

Because we would be concerned about the introduction of a component in the direction of \(x_0 \) due to roundoff error, we may want to subtract out that component as part of the iteration:

\begin{equation*} \begin{array}{l} x_0 := x_0 / \| x_0 \|_2 \\ v_1 := \mbox{ random vector} \\ v_1^\perp := v_1 - x_0^H v_1 x_0 \\ v_1^{(0)} := v_1^\perp / \| v_1^\perp \|_2 \\ {\bf for~} k:=0, \ldots \\ ~~~ v_1 := A v_1^{(k)} \\ ~~~ v_1^\perp := v_1 - x_0^H v_1 x_0 \\ ~~~ v_1^{(k+1)} := v_1^\perp / \| v_1^\perp \|_2 \\ {\bf endfor} \end{array} \end{equation*}

The steps that subtract out the component of \(v_1 \) in the direction of \(x_0\) are exactly those performed by the Gram-Schmidt process. Think of it as computing the QR factorization of the matrix \(\left( \begin{array}{c|c} x_0 \amp v_1 \end{array} \right) \text{:}\)

\begin{equation*} \begin{array}{l} v_1 := \mbox{ random vector} \\ ( \left( \begin{array}{c | c} x_0 \amp v_1^{(0)} \end{array} ), R \right) := {\rm QR}( \left( \begin{array}{c | c} x_0 \amp v_1 \end{array} \right) ) \\ {\bf for~} k:=0, \ldots \\ ~~~( \left( \begin{array}{c | c} x_0 \amp v_1^{(k+1)} \end{array} \right), R ) := {\rm QR}( \left( \begin{array}{c | c} x_0 \amp A v_1^{(k)} \end{array} \right) ) \\ {\bf endfor} \end{array} \end{equation*}

The problem is that we typically don't know \(x_0 \) up front. What we can instead do is to iterate with two random vectors, where the first converges to a vector associated with \(\lambda_0 \) and the second to one associated with \(\lambda_1 \text{:}\)

\begin{equation*} \begin{array}{l} v_0 := \mbox{ random vector} \\ v_1 := \mbox{ random vector} \\ ( \left( \begin{array}{c | c} v_0^{(0)}\amp v_1^{(0)} \end{array} ), R \right) := {\rm QR}( \left( \begin{array}{c | c} v_0 \amp v_1 \end{array} \right) ) \\ {\bf for~} k:=0, \ldots \\ ~~~( \left( \begin{array}{c | c} v_0^{(k+1)} \amp v_1^{(k+1)} \end{array} \right), R ) := {\rm QR}( \left( \begin{array}{c | c} A v_0^{(k)} \amp A v_1^{(k)} \end{array} \right) ) \\ {\bf endfor} \end{array} \end{equation*}

We observe:

  • The vectors \(v_0^{(k)} \) are then expected to converge linearly to a vector in the direction of \(x_0 \) at a rate dictated by the ratio \(\vert \lambda_1 \vert / \vert \lambda_0 \vert \text{.}\)

  • The vectors \(v_0^{(k)} \) are then expected to converge linearly to a vector in the direction of \(x_1 \) at a rate dictated by the ratio \(\vert \lambda_2 \vert / \vert \lambda_1 \vert \text{.}\)

  • If \(\vert \lambda_0 \vert = \vert \lambda_1 \vert\) and \(\vert \lambda_1 \vert \gt \vert \lambda_2\) then \(\Span( \{ v_0^{(k)}, v_1^{(k)} \} ) \) will eventually start approximating \(\Span( \{ x_0 x_1 \} ) \text{.}\)

So, we now have a method for finding two eigenvectors. The associated eigenvectors can be approximated via the Rayleigh quotient:

\begin{equation*} \lambda_0 \approx \lambda_0^{(k)} = {v_0^{(k)}}^H A v_0^{(k)} \end{equation*}

and

\begin{equation*} \lambda_1 \approx \lambda_1^{(k)} = {v_1^{(k)}}^H A v_1^{(k)} \end{equation*}

Alternatively, we note that for large \(k \)

\begin{equation*} A^{(k)} = \left( \begin{array}{c| c} v_0^{(k)} \amp v_1^{(k)} \end{array} \right)^H A \left( \begin{array}{c| c} v_0^{(k)} \amp v_1^{(k)} \end{array} \right) \approx \left( \begin{array}{c| c} \lambda_0 \amp 0 \\ \hline 0 \amp \lambda_1 \end{array} \right) \end{equation*}

if \(A \) is Hermitian, \(\vert \lambda_1 \vert > \vert \lambda_2 \vert \) and \(v^{(0)} \) and \(v^{(1)} \) have components in the directions of \(x_0 \) and \(x_1 \text{,}\) respectively.