## Unit9.3.1The Power Method

The Power Method is a simple method that under mild conditions yields a vector corresponding to the eigenvalue that is largest in magnitude.

Throughout this section we will assume that a given matrix $A \in \C^{m \times m}$ is diagonalizable. Thus, there exists a nonsingular matrix $X$ and diagonal matrix $\Lambda$ such that $A = X \Lambda X^{-1} \text{.}$ From the last section, we know that the columns of $X$ equal eigenvectors of $A$ and the elements on the diagonal of $\Lambda$ equal the eigenvalues:

\begin{equation*} X = \left( \begin{array}{c | c | c | c} x_0 \amp x_0 \amp \cdots \amp x_{m-1} \end{array} \right) \quad \mbox{and} \quad \Lambda = \left( \begin{array}{c | c | c | c} \lambda_{0} \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \lambda_{1} \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \lambda_{m-1} \end{array} \right) \end{equation*}

so that

\begin{equation*} A x_i = \lambda_i x_i \quad \mbox{for } i = 0, \ldots, m-1. \end{equation*}

For most of this section we will assume that

\begin{equation*} \vert \lambda_0 \vert \gt \vert \lambda_1 \vert \geq \cdots \geq \vert \lambda_{m-1} \vert . \end{equation*}

In particular, $\lambda_0$ is the eigenvalue with maximal absolute value.

### Subsubsection9.3.1.1First attempt

Let $v^{(0)} \in \C^{m \times m}$ be an initial guess''. Our (first attempt at the) Power Method iterates as follows:

\begin{equation*} \begin{array}{l} \mbox{Pick } v^{(0)} \\ {\bf for~} k = 0, \ldots \\ ~~~ v^{(k+1)} = A v^{(k)} \\ {\bf endfor} \end{array} \end{equation*}

Clearly $v^{(k)} = A^k v^{(0)} \text{.}$

Let

\begin{equation*} v^{(0)} = \psi_0 x_0 + \psi_1 x_1 + \cdots + \psi_{m-1} x_{m-1} . \end{equation*}

Then

\begin{equation*} \begin{array}{rcl} v^{(1)} = A v^{(0)} \amp=\amp A \left( \psi_0 x_0 + \psi_1 x_1 + \cdots + \psi_{m-1} x_{m-1} \right) \\ \amp=\amp \psi_0 A x_0 + \psi_1 A x_1 + \cdots + \psi_{m-1} A x_{m-1} \\ \amp=\amp \psi_0 \lambda_0 x_0 + \psi_1 \lambda_0 x_1 + \cdots + \psi_{m-1} \lambda_{m-1} x_{m-1}, \\ v^{(2)} = A v^{(1)} \amp=\amp A \left( \psi_0 \lambda_0 x_0 + \psi_1 \lambda_0 x_1 + \cdots + \psi_{m-1} \lambda_{m-1} x_{m-1} \right) \\ \amp=\amp \psi_0 \lambda_0 A x_0 + \psi_1 \lambda_0 A x_1 + \cdots + \psi_{m-1} \lambda_{m-1} A x_{m-1} \\ \amp = \amp \psi_0 \lambda_0^2 x_0 + \psi_1 \lambda_1^2 x_1 + \cdots + \psi_{m-1} \lambda_{m-1}^2 x_{m-1}, \\ \amp \vdots \amp \\ v^{(k)} = A v^{(k-1)} \amp=\amp \psi_0 \lambda_0^k x_0 + \psi_1 \lambda_1^k x_1 + \cdots + \psi_{m-1} \lambda_{m-1}^k x_{m-1}. \end{array} \end{equation*}

Now, as long as $\psi_0 \neq 0$ clearly $\psi_0 \lambda_0^k x_0$ will eventually dominate since

\begin{equation*} \vert \lambda_i \vert/\vert \lambda_0\vert \lt 1. \end{equation*}

This means that $v^{(k)}$ will start pointing in the direction of $x_0 \text{.}$ In other words, it will start pointing in the direction of an eigenvector corresponding to $\lambda_0 \text{.}$ The problem is that it will become infinitely long if $\vert \lambda_0 \vert \gt 1$ or infinitesimally short if $\vert \lambda_0 \vert \lt 1 \text{.}$ All is good if $\vert \lambda_0 \vert = 1 \text{.}$

An alternative way of looking at this is to exploit the fact that the eigenvectors, $x_i \text{,}$ equal the columns of $X \text{.}$ Then

\begin{equation*} y = \left( \begin{array}{c} \psi_0 \\ \psi_1 \\ \vdots \\ \psi_{m-1} \end{array} \right) = X^{-1} v^{(0)} \end{equation*}

and

\begin{equation*} \begin{array}{rcl} v^{(0)} = A^0 v^{(0)} \amp = \amp X y \\ v^{(1)} = A v^{(0)} \amp=\amp A X y = X \Lambda y \\ v^{(2)} = A v^{(1)} \amp=\amp A X \Lambda y = X \Lambda^2 y \\ \amp \vdots \amp \\ v^{(k)} = A v^{(k-1)} \amp=\amp A X \Lambda^{k-1} y = X \Lambda^{k} y \\ \end{array} \end{equation*}

Thus

\begin{equation*} \begin{array}{rcl} v^{(k)} \amp = \amp \left( \begin{array}{c c c c} x_0 \amp x_1 \amp \cdots \amp x_{m-1} \end{array} \right) \left( \begin{array}{c c c c} \lambda_0^k \amp 0 \amp \cdots \amp 0 \\ 0 \amp \lambda_1^k \amp \cdots \amp 0 \\ \vdots \amp \vdots \amp \ddots \amp \vdots \\ 0 \amp 0 \amp \cdots \amp \lambda_{m-1}^k \end{array} \right) \left( \begin{array}{c} \psi_0 \\ \psi_1 \\ \vdots \\ \psi_{m-1} \end{array} \right) \\ \amp = \amp \psi_0 \lambda_0^k x_0 + \psi_1 \lambda_1^k x_1 + \cdots + \psi_{m-1} \lambda_{m-1}^k x_{m-1}. \end{array} \end{equation*}

Notice how looking at $v^{(k)}$ in the right basis (the eigenvectors) simplified the explanation.

### Subsubsection9.3.1.2Second attempt

Given an initial $v^{(0)} \in \C^{m} \text{,}$ a second attempt at the Power Method iterates as follows:

\begin{equation*} \begin{array}{l} \mbox{Pick } v^{(0)} \\ {\bf for~} k = 0, \ldots \\ ~~~ v^{(k+1)} = A v^{(k)} / \lambda_0 \\ {\bf endfor} \end{array} \end{equation*}

It is not hard to see that then

\begin{equation*} \begin{array}{rcl} v^{(k)} \amp=\amp A v^{(k-1)} / \lambda_0 = A^k v^{(0)} / \lambda_0^k \\ \amp = \amp \psi_0 \left( \frac{\lambda_0}{\lambda_0} \right) ^k x_0 + \psi_1 \left( \frac{\lambda_1}{\lambda_0} \right) ^k x_1 + \cdots + \psi_{m-1} \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k x_{m-1} \\ \amp = \amp \psi_0 x_0 + \psi_1 \left( \frac{\lambda_1}{\lambda_0} \right) ^k x_1 + \cdots + \psi_{m-1} \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k x_{m-1}. \end{array} \end{equation*}

Clearly $\lim_{k \rightarrow \infty} v^{(k)} = \psi_0 x_0 \text{,}$ as long as $\psi_0 \neq 0 \text{,}$ since $\left\vert \frac{ \lambda_k }{ \lambda_0 } \right\vert \lt 1$ for $k \gt 0 \text{.}$

Another way of stating this is to notice that

\begin{equation*} A^k = \begin{array}[t]{c} \underbrace{( A A \cdots A )} \\ \mbox{ times} \end{array} = \begin{array}[t]{c} \underbrace{( X \Lambda X^{-1}) ( X \Lambda X^{-1}) \cdots ( X \Lambda X^{-1} )} \\ \Lambda^k \end{array} = X \Lambda^k X^{-1}. \end{equation*}

so that

\begin{equation*} \begin{array}{rcl} v^{(k)} \amp=\amp A^k v^{(0)} / \lambda_0^k \\ \amp=\amp A^k X y / \lambda_0^k \\ \amp=\amp X \Lambda^k X^{-1} X y / \lambda_0^k \\ \amp=\amp X \Lambda^k y / \lambda_0^k \\ \amp=\amp X \left( \Lambda^k / \lambda_0^k \right) y = X \left( \begin{array}{ c | c | c | c } 1 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^k \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k \end{array} \right) y. \end{array} \end{equation*}

Now, since $\left\vert \frac{\lambda_{k}}{\lambda_0} \right\vert \lt 1$ for $k \gt 1$ we can argue that

\begin{equation*} \begin{array}{l} \lim_{k \rightarrow \infty} v^{(k)} \\ ~~~= ~~~~\\ \lim_{k \rightarrow \infty} X \left( \begin{array}{ c | c | c | c } 1 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^k \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k \end{array} \right) y \\ ~~~= ~~~~\\ X \left( \begin{array}{ c | c | c | c } 1 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp 0 \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp 0 \end{array} \right) y \\ ~~~=~~~~ \\ X \psi_0 e_0 \\ ~~~= ~~~~\\ \psi_0 X e_0 = \psi_0 x_0. \end{array} \end{equation*}

Thus, as long as $\psi_0 \neq 0$ (which means $v^{(0)}$ must have a component in the direction of $x_0$) this method will eventually yield a vector in the direction of $x_0 \text{.}$ However, this time the problem is that we don't know $\lambda_0$ when we start.

### Subsubsection9.3.1.3A practical Power Method

The following algorithm, known as the Power Method, avoids the problem of $v^{(k)}$ growing or shrinking in length without requiring $\lambda_0$ to be known, by scaling it to be of unit length at each step:

\begin{equation*} \begin{array}{l} \mbox{Pick } v^{(0)} \mbox{ of unit length} \\ {\bf for~} k = 0, \ldots \\ ~~~v^{(k+1)} = A v^{(k)} \\ ~~~ v^{(k+1)} = v^{(k+1)} / \| v^{(k+1)} \| \\ {\bf endfor} \end{array} \end{equation*}

The idea is that we are only interested in the direction of the eigenvector, and hence it is convenient to rescale the vector to have unit length at each step.

### Subsubsection9.3.1.4The Rayleigh quotient

A question is how to extract an approximation of $\lambda_0$ given an approximation of $x_0 \text{.}$ The following insights provide the answer:

###### Definition9.3.1.1.Rayleigh quotient.

If $A \in \Cmxm$ and $x \neq 0 \in \Cm$ then

\begin{equation*} \frac{ x^H A x }{x^H x} \end{equation*}

is known as the Rayleigh quotient.

###### Homework9.3.1.1.

Let $x$ be an eigenvector of $A \text{.}$

ALWAYS/SOMETIMES/NEVER: $\lambda = x^H A x / (x^H x)$ is the associated eigenvalue of $A \text{.}$

Let $x$ be an eigenvector of $A$ and $\lambda$ the associated eigenvalue. Then $A x = \lambda x \text{.}$ Multiplying on the left by $x^H$ yields $x^H A x = \lambda x^H x$ which, since $x \neq 0$ means that $\lambda = x^H A x / ( x^H x ) \text{.}$
If $x$ is an approximation of the eigenvector $x_0$ associated with $\lambda_0 \text{,}$ then its Rayleigh quotient is an approximation to $\lambda_0 \text{.}$