ALAFF The Rayleigh Quotient Iteration

Subsection 9.3.4 The Rayleigh Quotient Iteration

A basic idea that allows one to accelerate the convergence of the inverse iteration is captured by the following exercise:

Homework 9.3.4.1.

Let \(A \in \mathbb C^{m \times m} \text{,}\) \(\rho \in \mathbb C \text{,}\) and \(( \lambda, x ) \) an eigenpair of \(A \) .

Which of the follow is an eigenpair of the shifted matrix \(A - \rho I \text{:}\)

\(( \lambda, x ) \text{.}\)
\(( \lambda, A^{-1} x ) \text{.}\)
\(( \lambda - \rho, x ) \text{.}\)
\(( 1/(\lambda - \rho), x ) \text{.}\)

Answer

\(( \lambda - \rho, x ) \text{.}\)

Now justify your answer.

Solution

Let \(A x = \lambda x \text{.}\) Then

\begin{equation*} ( A - \rho I ) x = A x - \rho x = \lambda x - \rho x = ( \lambda - \rho ) x. \end{equation*}

We conclude that \(( \lambda- \rho, x ) \) is an eigenpair of \(A-\rho I \text{.}\)

The matrix \(A - \rho I \) is referred to as the matrix \(A \) that has been "shifted" by \(\rho \text{.}\) What the next lemma captures is that shifting \(A \) by \(\rho\) shifts the spectrum of \(A \) by \(\rho \text{:}\)

Lemma 9.3.4.1.

Let \(A \in \C^{m \times m} \text{,}\) \(A = X \Lambda X^{-1} \) and \(\rho \in \C \text{.}\) Then \(A - \rho I = X ( \Lambda - \rho I ) X^{-1} \text{.}\)

Homework 9.3.4.2.

Prove Lemma 9.3.4.1.

Solution

\begin{equation*} A - \rho I = X \Lambda X^{-1} - \rho X X^{-1} = X ( \Lambda - \rho I ) X^{-1} . \end{equation*}

This suggests the following (naive) iteration: Pick a value \(\rho \) close to \(\lambda_{m-1} \text{.}\) Iterate

\begin{equation*} \begin{array}{l} \mbox{Pick } v^{(0)} \mbox{ of unit length} \\ {\bf for~} k = 0, \ldots\\ ~~~ v^{(k+1)} = ( A - \rho I )^{-1} v^{(k)} \\ ~~~ v^{(k+1)} = ( \lambda_{m-1} - \rho ) v^{(k+1)} \\ {\bf endfor} \end{array} \end{equation*}

Of course one would solve \(( A - \rho I ) v^{(k+1)} = v^{(k)} \) rather than computing and applying the inverse of \(A \text{.}\)

If we index the eigenvalues so that

\begin{equation*} \vert \lambda_{m-1} - \rho \vert \lt \vert \lambda_{m-2} - \rho \vert \leq \cdots \leq \vert \lambda_{0} - \rho \vert \end{equation*}

then

\begin{equation*} \begin{array}{l} \| v^{(k+1)} - \psi_{m-1} x_{m-1} \|_{X^{-1}} \leq \left\vert \frac{\lambda_{m-1}- \rho}{\lambda_{m-2} - \rho} \right\vert \| v^{(k)} - \psi_{m-1} x_{m-1} \|_{X^{-1}}. \end{array} \end{equation*}

The closer to \(\lambda_{m-1} \) the shift \(\rho \) is chosen, the more favorable the ratio (constant) that dictates the linear convergence of this modified Inverse Power Method.

A more practical algorithm is given by

\begin{equation*} \begin{array}{l} \mbox{Pick } v^{(0)} \mbox{ of unit length} \\ {\bf for~} k = 0, \ldots \\ ~~~ v^{(k+1)} = ( A - \rho I )^{-1} v^{(k)} \\ ~~~ v^{(k+1)} = v^{(k+1)} / \| v^{(k+1)} \| \\ {\bf endfor} \end{array} \end{equation*}

where instead of multiplying by the inverse one would want to solve the linear system \(( A - \rho I ) v^{(k+1)} = v^{(k)}\) instead.

The question now becomes how to chose \(\rho \) so that it is a good guess for \(\lambda_{m-1} \text{.}\) Often an application inherently supplies a reasonable approximation for the smallest eigenvalue or an eigenvalue of particular interest. Alternatively, we know that eventually \(v^{(k)} \) becomes a good approximation for \(x_{m-1} \) and therefore the Rayleigh quotient gives us a way to find a good approximation for \(\lambda_{m-1} \text{.}\) This suggests the (naive) Rayleigh-quotient iteration:

\begin{equation*} \begin{array}{l} \mbox{Pick } v^{(0)} \mbox{ of unit length} \\ {\bf for~} k = 0, \ldots \\ ~~~ \rho_k = v^{(k)\,H} A v^{(k)} / (v^{(k)\,H} v^{(k)}) \\ ~~~ v^{(k+1)} = ( A - \rho_k I )^{-1} v^{(k)} \\ ~~~ v^{(k+1)} = ( \lambda_{m-1} -\rho_k ) v^{(k+1)} \\ {\bf endfor} \end{array} \end{equation*}

Here \(\lambda_{m-1} \) is the eigenvalue to which the method eventually converges.

\begin{equation*} \begin{array}{l} \| v^{(k+1)} - \psi_{m-1} x_{m-1} \|_{X^{-1}} \leq \left\vert \frac{\lambda_{m-1}- \rho_k}{\lambda_{m-2} - \rho_k} \right\vert \| v^{(k)} - \psi_{m-1} x_{m-1} \|_{X^{-1}} \end{array} \end{equation*}

with

\begin{equation*} \lim_{k\rightarrow \infty} (\lambda_{m-1}- \rho_k) = 0 \end{equation*}

which means superlinear convergence is observed. In fact, it can be shown that once \(k \) is large enough

\begin{equation*} \begin{array}{l} \| v^{(k+1)} - \psi_{m-1} x_{m-1} \|_{X^{-1}} \leq C \| v^{(k)} - \psi_{m-1} x_{m-1} \|_{X^{-1}}^2, \end{array} \end{equation*}

thus achieving quadratic convergence. Roughly speaking this means that every iteration doubles the number of correct digits in the current approximation. To prove this, one shows that \(\vert \lambda_{m-1} - \rho_k \vert \leq K \| v^{(k)} - \psi_{m-1} x_{m-1} \|_{X^{-1}}\) for some constant \(K \text{.}\) Details go beyond this discussion.

Better yet, it can be shown that if \(A \) is Hermitian, then (once \(k \) is large enough)

\begin{equation*} \begin{array}{l} \| v^{(k+1)} - \psi_{m-1} x_{m-1} \| \leq C \| v^{(k)} - \psi_{m-1} x_{m-1} \|^3 \end{array} \end{equation*}

for some constant \(C \) and hence the naive Rayleigh Quotient Iteration achieves cubic convergence for Hermitian matrices. Here our norm \(\| \cdot \|_{X^{-1}} \) becomes any p-norm since the Spectral Decomposition Theorem tells us that for Hermitian matrices \(X \) can be taken to equal a unitary matrix. Roughly speaking this means that every iteration triples the number of correct digits in the current approximation. This is mind-boggling fast convergence!

A practical Rayleigh Quotient Iteration is given by

\begin{equation*} \begin{array}{l} v^{(0)} = v^{(0)} / \| v^{(0)} \|_2 \\ {\bf for~} k = 0, \ldots \\ ~~~\rho_k = v^{(k)\,H} A v^{(k)} ~~~~~~ \mbox{(Now } \| v^{(k)} \|_2 = 1 \mbox{)} \\ ~~~ v^{(k+1)} = ( A - \rho_k I )^{-1} v^{(k)} \\ ~~~ v^{(k+1)} = v^{(k+1)} / \| v^{(k+1)} \| \\ {\bf endfor} \end{array} \end{equation*}

Remark 9.3.4.2.

A concern with the (Shifted) Inverse Power Method and Rayleigh Quotient Iteration is that the matrix with which one solves is likely nearly singular. It turns out that this actually helps: the error that is amplified most is in the direction of the eigenvector associated with the smallest eigenvalue (after shifting, if appropriate).