## Subsection9.3.1The Power Method

The Power Method is a simple method that under mild conditions yields a vector corresponding to the eigenvalue that is largest in magnitude.

Throughout this section we will assume that a given matrix $A \in \C^{m \times m}$ is not deficient and hence diagonalizable. This is a practical assumption because roundoff error encountered as the method is executed will invariably make the matrix diagonalizable. Thus, there exists a nonsingular matrix $X$ and diagonal matrix $\Lambda$ such that $A = X \Lambda X^{-1} \text{.}$ From the last section, we know that the columns of $X$ equal eigenvectors of $A$ and the elements on the diagonal of $\Lambda$ equal the eigenvalues::

\begin{equation*} X = \left( \begin{array}{c | c | c | c} x_0 \amp x_0 \amp \cdots \amp x_{m-1} \end{array} \right) \quad \mbox{and} \quad \Lambda = \left( \begin{array}{c | c | c | c} \lambda_{0} \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \lambda_{1} \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \lambda_{m-1} \end{array} \right) \end{equation*}

so that

\begin{equation*} A x_i = \lambda_i x_i \quad \mbox{for }. \end{equation*}

For most of this section we will assume that

\begin{equation*} \vert \lambda_0 \vert \gt \vert \lambda_1 \vert \geq \cdots \geq \vert \lambda_{m-1} \vert . \end{equation*}

In particular, $\lambda_0$ is the eigenvalue with maximal absolute value.

### Subsubsection9.3.1.1First attempt

Let $v^{(0)} \in \C^{m \times m}$ be an initial guess''. Our (first attempt at the) Power Method iterates as follows:

\begin{equation*} \begin{array}{l} {\bf for~} k = 0, \ldots \\ ~~~ v^{(k+1)} = A v^{(k)} \\ {\bf endfor} \end{array} \end{equation*}

Clearly $v^{(k)} = A^k v^{(0)} \text{.}$ Let

\begin{equation*} v^{(0)} = X y = \psi_0 x_0 + \psi_1 x_1 + \cdots + \psi_{m-1} x_{m-1} . . \end{equation*}

What does this mean? We view the columns of $X$ as forming a basis for $\C^m$ and then the elements in vector $y = X^{-1} v^{(0)}$ equal the coefficients for describing $v^{(0)}$ in that basis. Then

\begin{equation*} \begin{array}{rcl} v^{(1)} = A v^{(0)} \amp=\amp A \left( \psi_0 x_0 + \psi_1 x_1 + \cdots + \psi_{m-1} x_{m-1} \right) \\ \amp=\amp \psi_0 \lambda_0 x_0 + \psi_1 \lambda_0 x_1 + \cdots + \psi_{m-1} \lambda_{m-1} x_{m-1}, \\ v^{(2)} = A v^{(1)} \amp=\amp \psi_0 \lambda_0^2 x_0 + \psi_1 \lambda_1^2 x_1 + \cdots + \psi_{m-1} \lambda_{m-1}^2 x_{m-1}, \\ \amp \vdots \amp \\ v^{(k)} = A v^{(k-1)} \amp=\amp \psi_0 \lambda_0^k x_0 + \psi_1 \lambda_1^k x_1 + \cdots + \psi_{m-1} \lambda_{m-1}^k x_{m-1}. \end{array} \end{equation*}

Now, as long as $\psi_0 \neq 0$ clearly $\psi_0 \lambda_0^k x_0$ will eventually dominate which means that $v^{(k)}$ will start pointing in the direction of $x_0 \text{.}$ In other words, it will start pointing in the direction of an eigenvector corresponding to $\lambda_0 \text{.}$ The problem is that it will become infinitely long if $\vert \lambda_0 \vert \gt 1$ or infinitessimily short if $\vert \lambda_0 \vert \lt 1 \text{.}$ All is good if $\vert \lambda_0 \vert = 1 \text{.}$

### Subsubsection9.3.1.2Second attempt

Given an initial $v^{(0)} \in \C^{m \times m} \text{,}$ a second attempt at the Power Method iterates as follows:

\begin{equation*} \begin{array}{l} {\bf for~} k = 0, \ldots \\ ~~~ v^{(k+1)} = A v^{(k)} / \lambda_0 \\ {\bf endfor} \end{array} \end{equation*}

It is not hard to see that then

\begin{equation*} \begin{array}{rcl} v^{(k)} \amp=\amp A v^{(k-1)} / \lambda_0 = A^k v^{(0)} / \lambda_0^k \\ \amp = \amp \psi_0 \left( \frac{\lambda_0}{\lambda_0} \right) ^k x_0 + \psi_1 \left( \frac{\lambda_1}{\lambda_0} \right) ^k x_1 + \cdots + \psi_{m-1} \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k x_{m-1} \\ \amp = \amp \psi_0 x_0 + \psi_1 \left( \frac{\lambda_1}{\lambda_0} \right) ^k x_1 + \cdots + \psi_{m-1} \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k x_{m-1}. \end{array} \end{equation*}

Clearly $\lim_{k \rightarrow \infty} v^{(k)} = \psi_0 x_0 \text{,}$ as long as $\psi_0 \neq 0 \text{,}$ since $\left\vert \frac{ \lambda_k }{ \lambda_0 } \right\vert \lt 1$ for $k \gt 0 \text{.}$

Another way of stating this is to notice that

\begin{equation*} A^k = \begin{array}[t]{c} \underbrace{( A A \cdots A )} \\ \mbox{ times} \end{array} = \begin{array}[t]{c} \underbrace{( X \Lambda X^{-1}) ( X \Lambda X^{-1}) \cdots ( X \Lambda X^{-1} )} \\ \Lambda^k \end{array} = X \Lambda^k X^{-1}. \end{equation*}

so that

\begin{equation*} \begin{array}{rcl} v^{(k)} \amp=\amp A^k v^{(0)} / \lambda_0^k \\ \amp=\amp A^k X y / \lambda_0^k \\ \amp=\amp X \Lambda^k X^{-1} X y / \lambda_0^k \\ \amp=\amp X \Lambda^k y / \lambda_0^k \\ \amp=\amp X \left( \Lambda^k / \lambda_0^k \right) y = X \left( \begin{array}{ c | c | c | c } 1 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^k \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k \end{array} \right) y. \end{array} \end{equation*}

Now, since $\left\vert \frac{\lambda_{k}}{\lambda_0} \right\vert \lt 1$ for $k \gt 1$ we can argue that

\begin{equation*} \begin{array}{l} \lim_{k \rightarrow \infty} v^{(k)} \\ ~~~= ~~~~\\ \lim_{k \rightarrow \infty} X \left( \begin{array}{ c | c | c | c } 1 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^k \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k \end{array} \right) y \\ ~~~= ~~~~\\ X \left( \begin{array}{ c | c | c | c } 1 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp 0 \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp 0 \end{array} \right) y \\ ~~~=~~~~ \\ X \psi_0 e_0 \\ ~~~= ~~~~\\ \psi_0 X e_0 = \psi_0 x_0. \end{array} \end{equation*}

Thus, as long as $\psi_0 \neq 0$ (which means $v$ must have a component in the direction of $x_0$) this method will eventually yield a vector in the direction of $x_0 \text{.}$ However, this time the problem is that we don't know $\lambda_0$ when we start.

### Subsubsection9.3.1.3Convergence

Before we make the algorithm practical, let us examine how fast the iteration converges. This requires a few definitions regarding rates of convergence.

###### Definition9.3.1.1. Convergence of a sequence of scalars.

Let $\alpha_0, \alpha_1 , \alpha_2, \ldots \in \R$ be an infinite sequence of scalars. Then $\alpha_k$ is said to converge to $\alpha$ if

\begin{equation*} \lim_{k \rightarrow \infty} \vert \alpha_k - \alpha \vert = 0 . \end{equation*}
###### Definition9.3.1.2. Convergence of a sequence of vectors.

Let $x_0, x_1 , x_2, \ldots \in \C^m$ be an infinite sequence of vectors. Then $x_k$ is said to converge to $x$ if for any norm $\| \cdot \|$

\begin{equation*} \lim_{k \rightarrow \infty} \| x_k - x \| = 0. \end{equation*}

Because of the equivalence of norms, discussed in (((Unresolved xref, reference ""; check spelling or use "provisional" attribute))) , if a sequence of vectors converges in one norm, then it converges in all norms.

###### Definition9.3.1.3. Rate of convergence.

Let $\alpha_0, \alpha_1 , \alpha_2, \ldots \in \R$ be an infinite sequence of scalars that converges to $\alpha \text{.}$ Then

• $\alpha_k$ is said to converge linearly to $\alpha$ if for sufficiently large $k$

\begin{equation*} \vert \alpha_{k+1} - \alpha \vert \leq C \vert \alpha_{k} - \alpha \vert \end{equation*}

for some constant $C \lt 1 \text{.}$

• $\alpha_k$ is said to converge superlinearly to $\alpha$ if for sufficiently large $k$

\begin{equation*} \vert \alpha_{k+1} - \alpha \vert \leq C_k \vert \alpha_{k} - \alpha \vert \end{equation*}

with $C_k \rightarrow 0 \text{.}$

• $\alpha_k$ is said to converge quadratically to $\alpha$ if for sufficiently large $k$

\begin{equation*} \vert \alpha_{k+1} - \alpha \vert \leq C \vert \alpha_{k} - \alpha \vert ^2 \end{equation*}

for some constant $C \text{.}$

• $\alpha_k$ is said to converge superquadratically to $\alpha$ if for sufficiently large $k$

\begin{equation*} \vert \alpha_{k+1} - \alpha \vert \leq C_k \vert \alpha_{k} - \alpha \vert ^2 \end{equation*}

with $C_k \rightarrow 0 \text{.}$

• $\alpha_k$ is said to converge cubically to $\alpha$ if for large enough $k$

\begin{equation*} \vert \alpha_{k+1} - \alpha \vert \leq C \vert \alpha_{k} - \alpha \vert ^3 \end{equation*}

for some constant $C \text{.}$

Linear convergence can be slow. Let's say that for $k \geq K$ we observe that

\begin{equation*} \vert \alpha_{k+1} - \alpha \vert \leq C \vert \alpha_{k} - \alpha \vert . \end{equation*}

Then, clearly, $\vert \alpha_{k+n} - \alpha\vert \leq C^n \vert \alpha_{k} - \alpha\vert \text{.}$ If $C = 0.99 \text{,}$ progress may be very, very slow. If $\vert \alpha_k - \alpha \vert = 1 \text{,}$ then

\begin{equation*} \begin{array}{rcl} \vert \alpha_{k+1} - \alpha \vert \amp\leq\amp 0.99000 \\ \vert \alpha_{k+2} - \alpha \vert \amp\leq\amp 0.98010 \\ \vert \alpha_{k+3} - \alpha \vert \amp\leq\amp 0.97030 \\ \vert \alpha_{k+4} - \alpha \vert \amp\leq\amp 0.96060 \\ \vert \alpha_{k+5} - \alpha \vert \amp\leq\amp 0.95099 \\ \vert \alpha_{k+6} - \alpha \vert \amp\leq\amp 0.94148 \\ \vert \alpha_{k+7} - \alpha \vert \amp\leq\amp 0.93206 \\ \vert \alpha_{k+8} - \alpha \vert \amp\leq\amp 0.92274 \\ \vert \alpha_{k+9} - \alpha \vert \amp\leq\amp 0.91351 \end{array} \end{equation*}

\begin{equation*} \begin{array}{rcl} \vert \alpha_{k+1} - \alpha\vert \amp\leq\amp C \vert \alpha_{k} - \alpha\vert ^2 \\ \vert \alpha_{k+2} - \alpha\vert \amp\leq\amp C \vert \alpha_{k+1} - \alpha\vert ^2 \leq C ( C \vert \alpha_{k} - \alpha\vert ^2 )^2 = C^3 \vert \alpha_{k} - \alpha\vert ^4 \\ \vert \alpha_{k+3} - \alpha\vert \amp\leq\amp C \vert \alpha_{k+2} - \alpha\vert ^2 \leq C ( C^3 \vert \alpha_{k} - \alpha\vert ^4 )^2 = C^7 \vert \alpha_{k} - \alpha\vert ^8 \\ \amp \vdots \amp \\ \vert \alpha_{k+n} - \alpha\vert \amp\leq\amp C^{2^n-1} \vert \alpha_{k} - \alpha\vert ^{2^n} \end{array} \end{equation*}

Even $C = 0.99$ and $\vert \alpha_k - \alpha \vert = 1 \text{,}$ then

\begin{equation*} \begin{array}{rcl} \vert \alpha_{k+1} - \alpha \vert \amp\leq\amp 0.99000 \\ \vert \alpha_{k+2} - \alpha \vert \amp\leq\amp 0.970299 \\ \vert \alpha_{k+3} - \alpha \vert \amp\leq\amp 0.932065 \\ \vert \alpha_{k+4} - \alpha \vert \amp\leq\amp 0.860058 \\ \vert \alpha_{k+5} - \alpha \vert \amp\leq\amp 0.732303 \\ \vert \alpha_{k+6} - \alpha \vert \amp\leq\amp 0.530905 \\ \vert \alpha_{k+7} - \alpha \vert \amp\leq\amp 0.279042 \\ \vert \alpha_{k+8} - \alpha \vert \amp\leq\amp 0.077085 \\ \vert \alpha_{k+9} - \alpha \vert \amp\leq\amp 0.005882 \\ \vert \alpha_{k+10} - \alpha \vert \amp\leq\amp 0.000034 \end{array} \end{equation*}

If we consider $\alpha$ the correct result then, eventually, the number of correct digits roughly doubles in each iteration. This can be explained as follows: If $\vert \alpha_{k} - \alpha \vert \lt 1 \text{,}$ then the number of correct decimal digits is given by

\begin{equation*} - \log_{10} \vert \alpha_{k} - \alpha \vert . \end{equation*}

Since $\log_{10}$ is a monotonically increasing function,

\begin{equation*} \log_{10} \vert \alpha_{k+1} - \alpha \vert \leq \log_{10} C \vert \alpha_{k} - \alpha \vert^2 = \log_{10}(C) + 2\log_{10} \vert \alpha_{k} - \alpha \vert \leq 2\log_{10}(\vert \alpha_{k} - \alpha \vert \end{equation*}

and hence

\begin{equation*} \begin{array}[t]{c} \underbrace{ - \log_{10} \vert \alpha_{k+1} - \alpha \vert } \\ \mbox{number of correct} \\ \mbox{digits in } \end{array} \geq 2( \begin{array}[t]{c} \underbrace{ - \log_{10}(\vert \alpha_{k} - \alpha \vert } \\ \mbox{number of correct} \\ \mbox{digits in } \end{array} ). \end{equation*}

Cubic convergence is dizzyingly fast: Eventually the number of correct digits triples from one iteration to the next.

For our analysis for the convergence of the Power Method, we define a convenient norm.

###### Homework9.3.1.1.

Prove Lemma 9.3.1.4.

Solution

We need to show that

• If $y \neq 0$ then $\| y \|_X \gt 0 \text{:}$

Let $y \neq 0$ and $z = X y \text{.}$ Then $z \neq 0$ since $X$ is nonsingular. Hence

\begin{equation*} \| y \|_X = \| X y \| = \| z \| \gt 0 . \end{equation*}
• If $\alpha \in \C$ and $y \in \Cm$ then $\| \alpha y \|_X = \vert \alpha \vert \| y \|_X \text{:}$

\begin{equation*} \| \alpha y \|_X = \| X ( \alpha y ) \| = \| \alpha X y \| = \vert \alpha \vert \| X y \| = \vert \alpha \vert \| y \|_X . \end{equation*}
• If $x, y \in \Cm$ then $\| x + y \|_X \leq \| x \|_X + \| y \|_X \text{:}$

\begin{equation*} \| x + y \|_X = \| X ( x + y ) \| = \| X x + X y \| \leq \| X x \| + \| X y \| = \| x \|_X + \| y \|_X . \end{equation*}

With this new norm, we can perform our convergence analysis:

\begin{equation*} \begin{array}{l} v^{(k)} - \psi_0 x_0 \\ ~~~=~~~~\\ A^k v^{(0)} / \lambda_0^k - \psi_0 x_0 \\ ~~~=~~~~\\ X \left( \begin{array}{ c | c | c | c } 1 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^k \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k \end{array} \right) X^{-1} v^{(0)} - \psi_0 x_0 \\ ~~~=~~~~\\ X \left( \begin{array}{ c | c | c | c } 1 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^k \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k \end{array} \right) y - \psi_0 x_0 \\ ~~~=~~~~\\ X \left( \begin{array}{ c | c | c | c } 0 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^k \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k \end{array} \right) y \end{array} \end{equation*}

Hence

\begin{equation*} X^{-1} ( v^{(k)} - \psi_0 x_0 )= \left( \begin{array}{ c | c | c | c } 0 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^k \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k \end{array} \right) y \end{equation*}

and

\begin{equation*} X^{-1} (v^{(k+1)} - \psi_0 x_0 ) = \left( \begin{array}{ c | c | c | c } 0 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \frac{\lambda_1}{\lambda_0} \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \frac{\lambda_{m-1}}{\lambda_0} \end{array} \right) X^{-1} ( v^{(k)} - \psi_0 x_0 ). \end{equation*}

Now, let $\| \cdot \|$ be a p-norm% \footnote{We choose a p-norm to make sure that the norm of a diagonal matrix equals the absolute value of the largest element (in magnitude) on its diagonal.}% and its induced matrix norm and $\| \cdot \|_{X^{-1}}$ as defined in Lemma 9.3.1.4. Then

\begin{equation*} \begin{array}{rcl} \| v^{(k+1)} - \psi_0 x_0 \|_{X^{-1}} \amp=\amp \| X^{-1} ( v^{(k+1)} - \psi_0 x_0 )\| \\ \amp=\amp \left\| \left( \begin{array}{ c | c | c | c } 0 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \frac{\lambda_1}{\lambda_0} \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \frac{\lambda_{m-1}}{\lambda_0} \end{array} \right) X^{-1} ( v^{(k)} - \psi_0 x_0 ) \right\| \\ \amp\leq\amp \left\vert \frac{\lambda_1}{\lambda_0} \right\vert \| X^{-1} ( v^{(k)} - \psi_0 x_0 ) \| = \left\vert \frac{\lambda_1}{\lambda_0} \right\vert \| v^{(k)} - \psi_0 x_0 \|_{X^{-1}}. \end{array} \end{equation*}

This shows that, in this norm, the convergence of $v^{(k)}$ to $\psi_0 x_0$ is linear: The difference between current approximation, $v^{(k)} \text{,}$ and the solution, $\psi x_0 \text{,}$ is reduced by at least a constant factor in each iteration.

### Subsubsection9.3.1.4A practical Power Method

The following algorithm, known as the Power Method, avoids the problem of $v^{(k)}$ growing or shrinking in length without requiring $\lambda_0$ to be known, by scaling it to be of unit length at each step:

\begin{equation*} \begin{array}{l} {\bf for~} k = 0, \ldots \\ ~~~v^{(k+1)} = A v^{(k)} \\ ~~~ v^{(k+1)} = v^{(k+1)} / \| v^{(k+1)} \| \\ {\bf endfor} \end{array} \end{equation*}

Its convergence is identical to that of the version analyzed in Subsubsection 9.3.1.3.

### Subsubsection9.3.1.5The Rayleigh quotient

A question is how to extract an approximation of $\lambda_0$ given an approximation of $x_0 \text{.}$ The following theorem provides the answer:

Let $x$ be an eigenvector of $A$ and $\lambda$ the associated eigenvalue. Then $A x = \lambda x \text{.}$ Multiplying on the left by $x^H$ yields $x^H A x = \lambda x^H x$ which, since $x \neq 0$ means that $\lambda = x^H A x / ( x^H x ) \text{.}$

For a given vector $x \neq 0 \text{,}$ the ratio

\begin{equation*} \frac{ x^H A x }{x^H x} \end{equation*}

is known as the Rayleigh quotient.

Clearly this ratio as a function of $x$ is continuous and hence an approximation to the eigenvector $x_0$ associated with $\lambda_0 \text{,}$ when plugged into this formula, would yield an approximation to $\lambda_0 \text{.}$

### Subsubsection9.3.1.6What if $\vert \lambda_0 \vert = \vert \lambda_1 \vert$

Now, what if

\begin{equation*} \vert \lambda_0 \vert = \cdots = \vert \lambda_{n-1} \vert \gt \vert \lambda_n \vert \geq \ldots \geq \vert \lambda_{m-1} \vert ? \end{equation*}

By extending the above analysis one can easily show that $v^{(k)}$ will converge to a vector in the subspace spanned by the eigenvectors associated with $\lambda_0, \ldots , \lambda_{n-1} \text{.}$

An important special case is when $n = 2\text{:}$ if $A$ is real valued then $\lambda_0$ may be complex valued in which case its conjugate, $\bar \lambda_0 \text{,}$ is also an eigenvalue and hence has the same magnitude as $\lambda_0 \text{.}$ We deduce that $v^{(k)}$ will always be in the space spanned by the eigenvectors corresponding to $\lambda_0$ and $\bar \lambda_0 \text{.}$