ALAFF The Power Method: Convergence

Subsection 9.3.2 The Power Method: Convergence

Before we make the algorithm practical, let us examine how fast the iteration converges. This requires a few definitions regarding rates of convergence.

Definition 9.3.2.1. Convergence of a sequence of scalars.

Let \(\alpha_0, \alpha_1 , \alpha_2, \ldots \in \R \) be an infinite sequence of scalars. Then \(\alpha_k \) is said to converge to \(\alpha \) if

\begin{equation*} \lim_{k \rightarrow \infty} \vert \alpha_k - \alpha \vert = 0 . \end{equation*}

Definition 9.3.2.2. Convergence of a sequence of vectors.

Let \(x_0, x_1 , x_2, \ldots \in \C^m \) be an infinite sequence of vectors. Then \(x_k \) is said to converge to \(x \) if for any norm \(\| \cdot \| \)

\begin{equation*} \lim_{k \rightarrow \infty} \| x_k - x \| = 0. \end{equation*}

Because of the equivalence of norms, discussed in Subsection 1.2.6, if a sequence of vectors converges in one norm, then it converges in all norms.

Definition 9.3.2.3. Rate of convergence.

Let \(\alpha_0, \alpha_1 , \alpha_2, \ldots \in \R \) be an infinite sequence of scalars that converges to \(\alpha \text{.}\) Then

\(\alpha_k \) is said to converge linearly to \(\alpha \) if for sufficiently large \(k \)

\begin{equation*} \vert \alpha_{k+1} - \alpha \vert \leq C \vert \alpha_{k} - \alpha \vert \end{equation*}

for some constant \(C \lt 1 \text{.}\) In other words, if

\begin{equation*} \lim_{k \rightarrow \infty} \frac {\vert \alpha_{k+1} - \alpha \vert} {\vert \alpha_{k} - \alpha \vert} = C \lt 1. \end{equation*}
\(\alpha_k \) is said to converge superlinearly to \(\alpha \) if for sufficiently large \(k \)

\begin{equation*} \vert \alpha_{k+1} - \alpha \vert \leq C_k \vert \alpha_{k} - \alpha \vert \end{equation*}

with \(C_k \rightarrow 0 \text{.}\) In other words, if

\begin{equation*} \lim_{k \rightarrow \infty} \frac {\vert \alpha_{k+1} - \alpha \vert} {\vert \alpha_{k} - \alpha \vert} = 0. \end{equation*}
\(\alpha_k \) is said to converge quadratically to \(\alpha \) if for sufficiently large \(k \)

\begin{equation*} \vert \alpha_{k+1} - \alpha \vert \leq C \vert \alpha_{k} - \alpha \vert ^2 \end{equation*}

for some constant \(C \text{.}\) In other words, if

\begin{equation*} \lim_{k \rightarrow \infty} \frac {\vert \alpha_{k+1} - \alpha \vert} {\vert \alpha_{k} - \alpha \vert^2} = C . \end{equation*}
\(\alpha_k \) is said to converge cubically to \(\alpha \) if for large enough \(k \)

\begin{equation*} \vert \alpha_{k+1} - \alpha \vert \leq C \vert \alpha_{k} - \alpha \vert ^3 \end{equation*}

for some constant \(C \text{.}\) In other words, if

\begin{equation*} \lim_{k \rightarrow \infty} \frac {\vert \alpha_{k+1} - \alpha \vert} {\vert \alpha_{k} - \alpha \vert^3} = C . \end{equation*}

Linear convergence can be slow. Let's say that for \(k \geq K \) we observe that

\begin{equation*} \vert \alpha_{k+1} - \alpha \vert \leq C \vert \alpha_{k} - \alpha \vert . \end{equation*}

Then, clearly, \(\vert \alpha_{k+n} - \alpha\vert \leq C^n \vert \alpha_{k} - \alpha\vert \text{.}\) If \(C = 0.99 \text{,}\) progress may be very, very slow. If \(\vert \alpha_k - \alpha \vert = 1 \text{,}\) then

\begin{equation*} \begin{array}{rcl} \vert \alpha_{k+1} - \alpha \vert \amp\leq\amp 0.99000 \\ \vert \alpha_{k+2} - \alpha \vert \amp\leq\amp 0.98010 \\ \vert \alpha_{k+3} - \alpha \vert \amp\leq\amp 0.97030 \\ \vert \alpha_{k+4} - \alpha \vert \amp\leq\amp 0.96060 \\ \vert \alpha_{k+5} - \alpha \vert \amp\leq\amp 0.95099 \\ \vert \alpha_{k+6} - \alpha \vert \amp\leq\amp 0.94148 \\ \vert \alpha_{k+7} - \alpha \vert \amp\leq\amp 0.93206 \\ \vert \alpha_{k+8} - \alpha \vert \amp\leq\amp 0.92274 \\ \vert \alpha_{k+9} - \alpha \vert \amp\leq\amp 0.91351 \end{array} \end{equation*}

Quadratic convergence is fast. Now

\begin{equation*} \begin{array}{rcl} \vert \alpha_{k+1} - \alpha\vert \amp\leq\amp C \vert \alpha_{k} - \alpha\vert ^2 \\ \vert \alpha_{k+2} - \alpha\vert \amp\leq\amp C \vert \alpha_{k+1} - \alpha\vert ^2 \leq C ( C \vert \alpha_{k} - \alpha\vert ^2 )^2 = C^3 \vert \alpha_{k} - \alpha\vert ^4 \\ \vert \alpha_{k+3} - \alpha\vert \amp\leq\amp C \vert \alpha_{k+2} - \alpha\vert ^2 \leq C ( C^3 \vert \alpha_{k} - \alpha\vert ^4 )^2 = C^7 \vert \alpha_{k} - \alpha\vert ^8 \\ \amp \vdots \amp \\ \vert \alpha_{k+n} - \alpha\vert \amp\leq\amp C^{2^n-1} \vert \alpha_{k} - \alpha\vert ^{2^n} \end{array} \end{equation*}

Even if \(C = 0.99 \) and \(\vert \alpha_k - \alpha \vert = 1 \text{,}\) then

\begin{equation*} \begin{array}{rcl} \vert \alpha_{k+1} - \alpha \vert \amp\leq\amp 0.99000 \\ \vert \alpha_{k+2} - \alpha \vert \amp\leq\amp 0.970299 \\ \vert \alpha_{k+3} - \alpha \vert \amp\leq\amp 0.932065 \\ \vert \alpha_{k+4} - \alpha \vert \amp\leq\amp 0.860058 \\ \vert \alpha_{k+5} - \alpha \vert \amp\leq\amp 0.732303 \\ \vert \alpha_{k+6} - \alpha \vert \amp\leq\amp 0.530905 \\ \vert \alpha_{k+7} - \alpha \vert \amp\leq\amp 0.279042 \\ \vert \alpha_{k+8} - \alpha \vert \amp\leq\amp 0.077085 \\ \vert \alpha_{k+9} - \alpha \vert \amp\leq\amp 0.005882 \\ \vert \alpha_{k+10} - \alpha \vert \amp\leq\amp 0.000034 \end{array} \end{equation*}

If we consider \(\alpha \) the correct result then, eventually, the number of correct digits roughly doubles in each iteration. This can be explained as follows: If \(\vert \alpha_{k} - \alpha \vert \lt 1 \text{,}\) then the number of correct decimal digits is given by

\begin{equation*} - \log_{10} \vert \alpha_{k} - \alpha \vert . \end{equation*}

Since \(\log_{10} \) is a monotonically increasing function,

\begin{equation*} \begin{array}{l} \log_{10} \vert \alpha_{k+1} - \alpha \vert \\ ~~~\leq ~~~~\\ \log_{10} C \vert \alpha_{k} - \alpha \vert^2 \\ ~~~ = ~~~~ \\ \log_{10}(C) + 2\log_{10} \vert \alpha_{k} - \alpha \vert \\ ~~~ \leq ~~~~\\ 2\log_{10}\vert \alpha_{k} - \alpha \vert \end{array} \end{equation*}

and hence

\begin{equation*} \begin{array}[t]{c} \underbrace{ - \log_{10} \vert \alpha_{k+1} - \alpha \vert } \\ \mbox{number of correct} \\ \mbox{digits in }\alpha_{k+1} \end{array} \geq 2( \begin{array}[t]{c} \underbrace{ - \log_{10}\vert \alpha_{k} - \alpha \vert } \\ \mbox{number of correct} \\ \mbox{digits in }\alpha_{k} \end{array} ). \end{equation*}

Cubic convergence is dizzyingly fast: Eventually the number of correct digits triples from one iteration to the next.

For our analysis for the convergence of the Power Method, we define a convenient norm.

Homework 9.3.2.1.

Let \(X \in \C^{m \times m} \) be nonsingular. Define \(\| \cdot \|_{X^{-1}} : \C^m \rightarrow \R \) by \(\| y \|_{X^{-1}} = \| X^{-1} y \| \) for some given norm \(\| \cdot \|: \C^m \rightarrow \R \text{.}\)

ALWAYS/SOMETIMES/NEVER: \(\| \cdot \|_{X^{-1}} \) is a norm.

Solution

We need to show that

If \(y \neq 0 \) then \(\| y \|_{X^{-1}} \gt 0 \text{:}\)

Let \(y \neq 0 \) and \(z = X^{-1} y \text{.}\) Then \(z \neq 0 \) since \(X \) is nonsingular. Hence

\begin{equation*} \| y \|_{X^{-1}} = \| X^{-1} y \| = \| z \| \gt 0 . \end{equation*}
If \(\alpha \in \C \) and \(y \in \Cm\) then \(\| \alpha y \|_{X^{-1}} = \vert \alpha \vert \| y \|_{X^{-1}} \text{:}\)

\begin{equation*} \| \alpha y \|_{X^{-1}} = \| {X^{-1}} ( \alpha y ) \| = \| \alpha {X^{-1}} y \| = \vert \alpha \vert \| {X^{-1}} y \| = \vert \alpha \vert \| y \|_{X^{-1}} . \end{equation*}
If \(x, y \in \Cm \) then \(\| x + y \|_{X^{-1}} \leq \| x \|_{X^{-1}} + \| y \|_{X^{-1}} \text{:}\)

\begin{equation*} \begin{array}{l} \| x + y \|_{X^{-1}}\\ ~~~=~~~~\\ \| {X^{-1}} ( x + y ) \| \\ ~~~=~~~~\\ \| {X^{-1}} x + X^{-1} y \| \\ ~~~\leq~~~~\\ \| {X^{-1}} x \| + \| {X^{-1}} y \| \\ ~~~=~~~~\\ \| x \|_{X^{-1}} + \| y \|_{X^{-1}} . \end{array} \end{equation*}

What do we learn from this exercise? Recall that a vector \(z \) can alternatively be written as \(X ( X^{-1} z ) \) so that the vector \(\hat z = X^{-1} z \) tells you how to represent the vector \(z \) in the basis given by the columns of \(X \text{.}\) What the exercise tells us is that if we measure a vector by applying a known norm in a new basis, then that is also a norm.

With this insight, we can perform our convergence analysis:

\begin{equation*} \begin{array}{l} v^{(k)} - \psi_0 x_0 \\ ~~~=~~~~\\ A^{k} v^{(0)} / \lambda_0^{k} - \psi_0 x_0 \\ ~~~=~~~~\\ X \left( \begin{array}{ c | c | c | c } 1 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^{k} \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k \end{array} \right) X^{-1} v^{(0)} - \psi_0 x_0 \\ ~~~=~~~~\\ X \left( \begin{array}{ c | c | c | c } 1 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^k \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k \end{array} \right) y - \psi_0 x_0 \\ ~~~=~~~~\\ X \left( \begin{array}{ c | c | c | c } 0 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^k \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k \end{array} \right) y \end{array} \end{equation*}

Hence

\begin{equation*} X^{-1} ( v^{(k)} - \psi_0 x_0 )= \left( \begin{array}{ c | c | c | c } 0 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^k \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k \end{array} \right) y \end{equation*}

and

\begin{equation*} X^{-1} (v^{(k+1)} - \psi_0 x_0 ) = \left( \begin{array}{ c | c | c | c } 0 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \frac{\lambda_1}{\lambda_0} \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \frac{\lambda_{m-1}}{\lambda_0} \end{array} \right) X^{-1} ( v^{(k)} - \psi_0 x_0 ). \end{equation*}

Now, let \(\| \cdot \| \) be a p-norm and \(\| \cdot \|_{X^{-1}} \) as defined in Homework 9.3.2.1. Then

\begin{equation*} \begin{array}{rcl} \| v^{(k+1)} - \psi_0 x_0 \|_{X^{-1}} \amp=\amp \| X^{-1} ( v^{(k+1)} - \psi_0 x_0 )\| \\ \amp=\amp \left\| \left( \begin{array}{ c | c | c | c } 0 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \frac{\lambda_1}{\lambda_0} \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \frac{\lambda_{m-1}}{\lambda_0} \end{array} \right) X^{-1} ( v^{(k)} - \psi_0 x_0 ) \right\| \\ \amp\leq\amp \left\vert \frac{\lambda_1}{\lambda_0} \right\vert \| X^{-1} ( v^{(k)} - \psi_0 x_0 ) \| = \left\vert \frac{\lambda_1}{\lambda_0} \right\vert \| v^{(k)} - \psi_0 x_0 \|_{X^{-1}}. \end{array} \end{equation*}

This shows that, in this norm, the convergence of \(v^{(k)} \) to \(\psi_0 x_0 \) is linear: The difference between current approximation, \(v^{(k)} \text{,}\) and the eventual vector in the direction of the desired eigenvector, \(\psi x_0 \text{,}\) is reduced by at least a constant factor in each iteration.

Now, what if

\begin{equation*} \vert \lambda_0 \vert = \cdots = \vert \lambda_{n-1} \vert \gt \vert \lambda_n \vert \geq \ldots \geq \vert \lambda_{m-1} \vert ? \end{equation*}

By extending the above analysis one can easily show that \(v^{(k)} \) will converge to a vector in the subspace spanned by the eigenvectors associated with \(\lambda_0, \ldots , \lambda_{n-1} \text{.}\)

An important special case is when \(n = 2\text{:}\) if \(A \) is real-valued then \(\lambda_0 \) may be complex-valued in which case its conjugate, \(\bar \lambda_0 \text{,}\) is also an eigenvalue and hence has the same magnitude as \(\lambda_0 \text{.}\) We deduce that \(v^{(k)} \) will always be in the space spanned by the eigenvectors corresponding to \(\lambda_0 \) and \(\bar \lambda_0 \text{.}\)