Skip to main content

Subsection 9.3.1 The Power Method

The Power Method is a simple method that under mild conditions yields a vector corresponding to the eigenvalue that is largest in magnitude.

Throughout this section we will assume that a given matrix \(A \in \C^{m \times m} \) is not deficient and hence diagonalizable. This is a practical assumption because roundoff error encountered as the method is executed will invariably make the matrix diagonalizable. Thus, there exists a nonsingular matrix \(X \) and diagonal matrix \(\Lambda\) such that \(A = X \Lambda X^{-1} \text{.}\) From the last section, we know that the columns of \(X \) equal eigenvectors of \(A \) and the elements on the diagonal of \(\Lambda \) equal the eigenvalues::

\begin{equation*} X = \left( \begin{array}{c | c | c | c} x_0 \amp x_0 \amp \cdots \amp x_{m-1} \end{array} \right) \quad \mbox{and} \quad \Lambda = \left( \begin{array}{c | c | c | c} \lambda_{0} \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \lambda_{1} \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \lambda_{m-1} \end{array} \right) \end{equation*}

so that

\begin{equation*} A x_i = \lambda_i x_i \quad \mbox{for }. \end{equation*}

For most of this section we will assume that

\begin{equation*} \vert \lambda_0 \vert \gt \vert \lambda_1 \vert \geq \cdots \geq \vert \lambda_{m-1} \vert . \end{equation*}

In particular, \(\lambda_0 \) is the eigenvalue with maximal absolute value.

Subsubsection 9.3.1.1 First attempt

Let \(v^{(0)} \in \C^{m \times m} \) be an ``initial guess''. Our (first attempt at the) Power Method iterates as follows:

\begin{equation*} \begin{array}{l} {\bf for~} k = 0, \ldots \\ ~~~ v^{(k+1)} = A v^{(k)} \\ {\bf endfor} \end{array} \end{equation*}

Clearly \(v^{(k)} = A^k v^{(0)} \text{.}\) Let

\begin{equation*} v^{(0)} = X y = \psi_0 x_0 + \psi_1 x_1 + \cdots + \psi_{m-1} x_{m-1} . . \end{equation*}

What does this mean? We view the columns of \(X \) as forming a basis for \(\C^m \) and then the elements in vector \(y = X^{-1} v^{(0)} \) equal the coefficients for describing \(v^{(0)} \) in that basis. Then

\begin{equation*} \begin{array}{rcl} v^{(1)} = A v^{(0)} \amp=\amp A \left( \psi_0 x_0 + \psi_1 x_1 + \cdots + \psi_{m-1} x_{m-1} \right) \\ \amp=\amp \psi_0 \lambda_0 x_0 + \psi_1 \lambda_0 x_1 + \cdots + \psi_{m-1} \lambda_{m-1} x_{m-1}, \\ v^{(2)} = A v^{(1)} \amp=\amp \psi_0 \lambda_0^2 x_0 + \psi_1 \lambda_1^2 x_1 + \cdots + \psi_{m-1} \lambda_{m-1}^2 x_{m-1}, \\ \amp \vdots \amp \\ v^{(k)} = A v^{(k-1)} \amp=\amp \psi_0 \lambda_0^k x_0 + \psi_1 \lambda_1^k x_1 + \cdots + \psi_{m-1} \lambda_{m-1}^k x_{m-1}. \end{array} \end{equation*}

Now, as long as \(\psi_0 \neq 0 \) clearly \(\psi_0 \lambda_0^k x_0 \) will eventually dominate which means that \(v^{(k)} \) will start pointing in the direction of \(x_0 \text{.}\) In other words, it will start pointing in the direction of an eigenvector corresponding to \(\lambda_0 \text{.}\) The problem is that it will become infinitely long if \(\vert \lambda_0 \vert \gt 1 \) or infinitessimily short if \(\vert \lambda_0 \vert \lt 1 \text{.}\) All is good if \(\vert \lambda_0 \vert = 1 \text{.}\)

Subsubsection 9.3.1.2 Second attempt

Given an initial \(v^{(0)} \in \C^{m \times m} \text{,}\) a second attempt at the Power Method iterates as follows:

\begin{equation*} \begin{array}{l} {\bf for~} k = 0, \ldots \\ ~~~ v^{(k+1)} = A v^{(k)} / \lambda_0 \\ {\bf endfor} \end{array} \end{equation*}

It is not hard to see that then

\begin{equation*} \begin{array}{rcl} v^{(k)} \amp=\amp A v^{(k-1)} / \lambda_0 = A^k v^{(0)} / \lambda_0^k \\ \amp = \amp \psi_0 \left( \frac{\lambda_0}{\lambda_0} \right) ^k x_0 + \psi_1 \left( \frac{\lambda_1}{\lambda_0} \right) ^k x_1 + \cdots + \psi_{m-1} \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k x_{m-1} \\ \amp = \amp \psi_0 x_0 + \psi_1 \left( \frac{\lambda_1}{\lambda_0} \right) ^k x_1 + \cdots + \psi_{m-1} \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k x_{m-1}. \end{array} \end{equation*}

Clearly \(\lim_{k \rightarrow \infty} v^{(k)} = \psi_0 x_0 \text{,}\) as long as \(\psi_0 \neq 0 \text{,}\) since \(\left\vert \frac{ \lambda_k }{ \lambda_0 } \right\vert \lt 1 \) for \(k \gt 0 \text{.}\)

Another way of stating this is to notice that

\begin{equation*} A^k = \begin{array}[t]{c} \underbrace{( A A \cdots A )} \\ \mbox{ times} \end{array} = \begin{array}[t]{c} \underbrace{( X \Lambda X^{-1}) ( X \Lambda X^{-1}) \cdots ( X \Lambda X^{-1} )} \\ \Lambda^k \end{array} = X \Lambda^k X^{-1}. \end{equation*}

so that

\begin{equation*} \begin{array}{rcl} v^{(k)} \amp=\amp A^k v^{(0)} / \lambda_0^k \\ \amp=\amp A^k X y / \lambda_0^k \\ \amp=\amp X \Lambda^k X^{-1} X y / \lambda_0^k \\ \amp=\amp X \Lambda^k y / \lambda_0^k \\ \amp=\amp X \left( \Lambda^k / \lambda_0^k \right) y = X \left( \begin{array}{ c | c | c | c } 1 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^k \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k \end{array} \right) y. \end{array} \end{equation*}

Now, since \(\left\vert \frac{\lambda_{k}}{\lambda_0} \right\vert \lt 1 \) for \(k \gt 1 \) we can argue that

\begin{equation*} \begin{array}{l} \lim_{k \rightarrow \infty} v^{(k)} \\ ~~~= ~~~~\\ \lim_{k \rightarrow \infty} X \left( \begin{array}{ c | c | c | c } 1 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^k \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k \end{array} \right) y \\ ~~~= ~~~~\\ X \left( \begin{array}{ c | c | c | c } 1 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp 0 \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp 0 \end{array} \right) y \\ ~~~=~~~~ \\ X \psi_0 e_0 \\ ~~~= ~~~~\\ \psi_0 X e_0 = \psi_0 x_0. \end{array} \end{equation*}

Thus, as long as \(\psi_0 \neq 0 \) (which means \(v \) must have a component in the direction of \(x_0 \)) this method will eventually yield a vector in the direction of \(x_0 \text{.}\) However, this time the problem is that we don't know \(\lambda_0 \) when we start.

Subsubsection 9.3.1.3 Convergence

Before we make the algorithm practical, let us examine how fast the iteration converges. This requires a few definitions regarding rates of convergence.

Definition 9.3.1.1. Convergence of a sequence of scalars.

Let \(\alpha_0, \alpha_1 , \alpha_2, \ldots \in \R \) be an infinite sequence of scalars. Then \(\alpha_k \) is said to converge to \(\alpha \) if

\begin{equation*} \lim_{k \rightarrow \infty} \vert \alpha_k - \alpha \vert = 0 . \end{equation*}
Definition 9.3.1.2. Convergence of a sequence of vectors.

Let \(x_0, x_1 , x_2, \ldots \in \C^m \) be an infinite sequence of vectors. Then \(x_k \) is said to converge to \(x \) if for any norm \(\| \cdot \| \)

\begin{equation*} \lim_{k \rightarrow \infty} \| x_k - x \| = 0. \end{equation*}

Because of the equivalence of norms, discussed in (((Unresolved xref, reference ""; check spelling or use "provisional" attribute))) , if a sequence of vectors converges in one norm, then it converges in all norms.

Definition 9.3.1.3. Rate of convergence.

Let \(\alpha_0, \alpha_1 , \alpha_2, \ldots \in \R \) be an infinite sequence of scalars that converges to \(\alpha \text{.}\) Then

  • \(\alpha_k \) is said to converge linearly to \(\alpha \) if for sufficiently large \(k \)

    \begin{equation*} \vert \alpha_{k+1} - \alpha \vert \leq C \vert \alpha_{k} - \alpha \vert \end{equation*}

    for some constant \(C \lt 1 \text{.}\)

  • \(\alpha_k \) is said to converge superlinearly to \(\alpha \) if for sufficiently large \(k \)

    \begin{equation*} \vert \alpha_{k+1} - \alpha \vert \leq C_k \vert \alpha_{k} - \alpha \vert \end{equation*}

    with \(C_k \rightarrow 0 \text{.}\)

  • \(\alpha_k \) is said to converge quadratically to \(\alpha \) if for sufficiently large \(k \)

    \begin{equation*} \vert \alpha_{k+1} - \alpha \vert \leq C \vert \alpha_{k} - \alpha \vert ^2 \end{equation*}

    for some constant \(C \text{.}\)

  • \(\alpha_k \) is said to converge superquadratically to \(\alpha \) if for sufficiently large \(k \)

    \begin{equation*} \vert \alpha_{k+1} - \alpha \vert \leq C_k \vert \alpha_{k} - \alpha \vert ^2 \end{equation*}

    with \(C_k \rightarrow 0 \text{.}\)

  • \(\alpha_k \) is said to converge cubically to \(\alpha \) if for large enough \(k \)

    \begin{equation*} \vert \alpha_{k+1} - \alpha \vert \leq C \vert \alpha_{k} - \alpha \vert ^3 \end{equation*}

    for some constant \(C \text{.}\)

Linear convergence can be slow. Let's say that for \(k \geq K \) we observe that

\begin{equation*} \vert \alpha_{k+1} - \alpha \vert \leq C \vert \alpha_{k} - \alpha \vert . \end{equation*}

Then, clearly, \(\vert \alpha_{k+n} - \alpha\vert \leq C^n \vert \alpha_{k} - \alpha\vert \text{.}\) If \(C = 0.99 \text{,}\) progress may be very, very slow. If \(\vert \alpha_k - \alpha \vert = 1 \text{,}\) then

\begin{equation*} \begin{array}{rcl} \vert \alpha_{k+1} - \alpha \vert \amp\leq\amp 0.99000 \\ \vert \alpha_{k+2} - \alpha \vert \amp\leq\amp 0.98010 \\ \vert \alpha_{k+3} - \alpha \vert \amp\leq\amp 0.97030 \\ \vert \alpha_{k+4} - \alpha \vert \amp\leq\amp 0.96060 \\ \vert \alpha_{k+5} - \alpha \vert \amp\leq\amp 0.95099 \\ \vert \alpha_{k+6} - \alpha \vert \amp\leq\amp 0.94148 \\ \vert \alpha_{k+7} - \alpha \vert \amp\leq\amp 0.93206 \\ \vert \alpha_{k+8} - \alpha \vert \amp\leq\amp 0.92274 \\ \vert \alpha_{k+9} - \alpha \vert \amp\leq\amp 0.91351 \end{array} \end{equation*}

Quadratic convergence is fast. Now

\begin{equation*} \begin{array}{rcl} \vert \alpha_{k+1} - \alpha\vert \amp\leq\amp C \vert \alpha_{k} - \alpha\vert ^2 \\ \vert \alpha_{k+2} - \alpha\vert \amp\leq\amp C \vert \alpha_{k+1} - \alpha\vert ^2 \leq C ( C \vert \alpha_{k} - \alpha\vert ^2 )^2 = C^3 \vert \alpha_{k} - \alpha\vert ^4 \\ \vert \alpha_{k+3} - \alpha\vert \amp\leq\amp C \vert \alpha_{k+2} - \alpha\vert ^2 \leq C ( C^3 \vert \alpha_{k} - \alpha\vert ^4 )^2 = C^7 \vert \alpha_{k} - \alpha\vert ^8 \\ \amp \vdots \amp \\ \vert \alpha_{k+n} - \alpha\vert \amp\leq\amp C^{2^n-1} \vert \alpha_{k} - \alpha\vert ^{2^n} \end{array} \end{equation*}

Even \(C = 0.99 \) and \(\vert \alpha_k - \alpha \vert = 1 \text{,}\) then

\begin{equation*} \begin{array}{rcl} \vert \alpha_{k+1} - \alpha \vert \amp\leq\amp 0.99000 \\ \vert \alpha_{k+2} - \alpha \vert \amp\leq\amp 0.970299 \\ \vert \alpha_{k+3} - \alpha \vert \amp\leq\amp 0.932065 \\ \vert \alpha_{k+4} - \alpha \vert \amp\leq\amp 0.860058 \\ \vert \alpha_{k+5} - \alpha \vert \amp\leq\amp 0.732303 \\ \vert \alpha_{k+6} - \alpha \vert \amp\leq\amp 0.530905 \\ \vert \alpha_{k+7} - \alpha \vert \amp\leq\amp 0.279042 \\ \vert \alpha_{k+8} - \alpha \vert \amp\leq\amp 0.077085 \\ \vert \alpha_{k+9} - \alpha \vert \amp\leq\amp 0.005882 \\ \vert \alpha_{k+10} - \alpha \vert \amp\leq\amp 0.000034 \end{array} \end{equation*}

If we consider \(\alpha \) the correct result then, eventually, the number of correct digits roughly doubles in each iteration. This can be explained as follows: If \(\vert \alpha_{k} - \alpha \vert \lt 1 \text{,}\) then the number of correct decimal digits is given by

\begin{equation*} - \log_{10} \vert \alpha_{k} - \alpha \vert . \end{equation*}

Since \(\log_{10} \) is a monotonically increasing function,

\begin{equation*} \log_{10} \vert \alpha_{k+1} - \alpha \vert \leq \log_{10} C \vert \alpha_{k} - \alpha \vert^2 = \log_{10}(C) + 2\log_{10} \vert \alpha_{k} - \alpha \vert \leq 2\log_{10}(\vert \alpha_{k} - \alpha \vert \end{equation*}

and hence

\begin{equation*} \begin{array}[t]{c} \underbrace{ - \log_{10} \vert \alpha_{k+1} - \alpha \vert } \\ \mbox{number of correct} \\ \mbox{digits in } \end{array} \geq 2( \begin{array}[t]{c} \underbrace{ - \log_{10}(\vert \alpha_{k} - \alpha \vert } \\ \mbox{number of correct} \\ \mbox{digits in } \end{array} ). \end{equation*}

Cubic convergence is dizzyingly fast: Eventually the number of correct digits triples from one iteration to the next.

For our analysis for the convergence of the Power Method, we define a convenient norm.

Homework 9.3.1.1.

Prove Lemma 9.3.1.4.

Solution

We need to show that

  • If \(y \neq 0 \) then \(\| y \|_X \gt 0 \text{:}\)

    Let \(y \neq 0 \) and \(z = X y \text{.}\) Then \(z \neq 0 \) since \(X \) is nonsingular. Hence

    \begin{equation*} \| y \|_X = \| X y \| = \| z \| \gt 0 . \end{equation*}
  • If \(\alpha \in \C \) and \(y \in \Cm\) then \(\| \alpha y \|_X = \vert \alpha \vert \| y \|_X \text{:}\)

    \begin{equation*} \| \alpha y \|_X = \| X ( \alpha y ) \| = \| \alpha X y \| = \vert \alpha \vert \| X y \| = \vert \alpha \vert \| y \|_X . \end{equation*}
  • If \(x, y \in \Cm \) then \(\| x + y \|_X \leq \| x \|_X + \| y \|_X \text{:}\)

    \begin{equation*} \| x + y \|_X = \| X ( x + y ) \| = \| X x + X y \| \leq \| X x \| + \| X y \| = \| x \|_X + \| y \|_X . \end{equation*}

With this new norm, we can perform our convergence analysis:

\begin{equation*} \begin{array}{l} v^{(k)} - \psi_0 x_0 \\ ~~~=~~~~\\ A^k v^{(0)} / \lambda_0^k - \psi_0 x_0 \\ ~~~=~~~~\\ X \left( \begin{array}{ c | c | c | c } 1 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^k \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k \end{array} \right) X^{-1} v^{(0)} - \psi_0 x_0 \\ ~~~=~~~~\\ X \left( \begin{array}{ c | c | c | c } 1 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^k \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k \end{array} \right) y - \psi_0 x_0 \\ ~~~=~~~~\\ X \left( \begin{array}{ c | c | c | c } 0 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^k \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k \end{array} \right) y \end{array} \end{equation*}

Hence

\begin{equation*} X^{-1} ( v^{(k)} - \psi_0 x_0 )= \left( \begin{array}{ c | c | c | c } 0 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^k \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k \end{array} \right) y \end{equation*}

and

\begin{equation*} X^{-1} (v^{(k+1)} - \psi_0 x_0 ) = \left( \begin{array}{ c | c | c | c } 0 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \frac{\lambda_1}{\lambda_0} \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \frac{\lambda_{m-1}}{\lambda_0} \end{array} \right) X^{-1} ( v^{(k)} - \psi_0 x_0 ). \end{equation*}

Now, let \(\| \cdot \| \) be a p-norm% \footnote{We choose a p-norm to make sure that the norm of a diagonal matrix equals the absolute value of the largest element (in magnitude) on its diagonal.}% and its induced matrix norm and \(\| \cdot \|_{X^{-1}} \) as defined in Lemma 9.3.1.4. Then

\begin{equation*} \begin{array}{rcl} \| v^{(k+1)} - \psi_0 x_0 \|_{X^{-1}} \amp=\amp \| X^{-1} ( v^{(k+1)} - \psi_0 x_0 )\| \\ \amp=\amp \left\| \left( \begin{array}{ c | c | c | c } 0 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \frac{\lambda_1}{\lambda_0} \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \frac{\lambda_{m-1}}{\lambda_0} \end{array} \right) X^{-1} ( v^{(k)} - \psi_0 x_0 ) \right\| \\ \amp\leq\amp \left\vert \frac{\lambda_1}{\lambda_0} \right\vert \| X^{-1} ( v^{(k)} - \psi_0 x_0 ) \| = \left\vert \frac{\lambda_1}{\lambda_0} \right\vert \| v^{(k)} - \psi_0 x_0 \|_{X^{-1}}. \end{array} \end{equation*}

This shows that, in this norm, the convergence of \(v^{(k)} \) to \(\psi_0 x_0 \) is linear: The difference between current approximation, \(v^{(k)} \text{,}\) and the solution, \(\psi x_0 \text{,}\) is reduced by at least a constant factor in each iteration.

Subsubsection 9.3.1.4 A practical Power Method

The following algorithm, known as the Power Method, avoids the problem of \(v^{(k)} \) growing or shrinking in length without requiring \(\lambda_0 \) to be known, by scaling it to be of unit length at each step:

\begin{equation*} \begin{array}{l} {\bf for~} k = 0, \ldots \\ ~~~v^{(k+1)} = A v^{(k)} \\ ~~~ v^{(k+1)} = v^{(k+1)} / \| v^{(k+1)} \| \\ {\bf endfor} \end{array} \end{equation*}

Its convergence is identical to that of the version analyzed in Subsubsection 9.3.1.3.

Subsubsection 9.3.1.5 The Rayleigh quotient

A question is how to extract an approximation of \(\lambda_0 \) given an approximation of \(x_0 \text{.}\) The following theorem provides the answer:

Let \(x \) be an eigenvector of \(A \) and \(\lambda \) the associated eigenvalue. Then \(A x = \lambda x \text{.}\) Multiplying on the left by \(x^H \) yields \(x^H A x = \lambda x^H x \) which, since \(x \neq 0 \) means that \(\lambda = x^H A x / ( x^H x ) \text{.}\)

For a given vector \(x \neq 0 \text{,}\) the ratio

\begin{equation*} \frac{ x^H A x }{x^H x} \end{equation*}

is known as the Rayleigh quotient.

Clearly this ratio as a function of \(x \) is continuous and hence an approximation to the eigenvector \(x_0 \) associated with \(\lambda_0 \text{,}\) when plugged into this formula, would yield an approximation to \(\lambda_0 \text{.}\)

Subsubsection 9.3.1.6 What if \(\vert \lambda_0 \vert = \vert \lambda_1 \vert\)

Now, what if

\begin{equation*} \vert \lambda_0 \vert = \cdots = \vert \lambda_{n-1} \vert \gt \vert \lambda_n \vert \geq \ldots \geq \vert \lambda_{m-1} \vert ? \end{equation*}

By extending the above analysis one can easily show that \(v^{(k)} \) will converge to a vector in the subspace spanned by the eigenvectors associated with \(\lambda_0, \ldots , \lambda_{n-1} \text{.}\)

An important special case is when \(n = 2\text{:}\) if \(A \) is real valued then \(\lambda_0 \) may be complex valued in which case its conjugate, \(\bar \lambda_0 \text{,}\) is also an eigenvalue and hence has the same magnitude as \(\lambda_0 \text{.}\) We deduce that \(v^{(k)} \) will always be in the space spanned by the eigenvectors corresponding to \(\lambda_0 \) and \(\bar \lambda_0 \text{.}\)