ALAFF The Power Method

Subsection 9.3.1 The Power Method

The Power Method is a simple method that under mild conditions yields a vector corresponding to the eigenvalue that is largest in magnitude.

Throughout this section we will assume that a given matrix \(A \in \C^{m \times m} \) is diagonalizable. Thus, there exists a nonsingular matrix \(X \) and diagonal matrix \(\Lambda\) such that \(A = X \Lambda X^{-1} \text{.}\) From the last section, we know that the columns of \(X \) equal eigenvectors of \(A \) and the elements on the diagonal of \(\Lambda \) equal the eigenvalues:

\begin{equation*} X = \left( \begin{array}{c | c | c | c} x_0 \amp x_0 \amp \cdots \amp x_{m-1} \end{array} \right) \quad \mbox{and} \quad \Lambda = \left( \begin{array}{c | c | c | c} \lambda_{0} \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \lambda_{1} \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \lambda_{m-1} \end{array} \right) \end{equation*}

so that

\begin{equation*} A x_i = \lambda_i x_i \quad \mbox{for } i = 0, \ldots, m-1. \end{equation*}

For most of this section we will assume that

\begin{equation*} \vert \lambda_0 \vert \gt \vert \lambda_1 \vert \geq \cdots \geq \vert \lambda_{m-1} \vert . \end{equation*}

In particular, \(\lambda_0 \) is the eigenvalue with maximal absolute value.

Subsubsection 9.3.1.1 First attempt

Let \(v^{(0)} \in \C^{m \times m} \) be an ``initial guess''. Our (first attempt at the) Power Method iterates as follows:

\begin{equation*} \begin{array}{l} \mbox{Pick } v^{(0)} \\ {\bf for~} k = 0, \ldots \\ ~~~ v^{(k+1)} = A v^{(k)} \\ {\bf endfor} \end{array} \end{equation*}

Clearly \(v^{(k)} = A^k v^{(0)} \text{.}\)

Let

\begin{equation*} v^{(0)} = \psi_0 x_0 + \psi_1 x_1 + \cdots + \psi_{m-1} x_{m-1} . \end{equation*}

Then

\begin{equation*} \begin{array}{rcl} v^{(1)} = A v^{(0)} \amp=\amp A \left( \psi_0 x_0 + \psi_1 x_1 + \cdots + \psi_{m-1} x_{m-1} \right) \\ \amp=\amp \psi_0 A x_0 + \psi_1 A x_1 + \cdots + \psi_{m-1} A x_{m-1} \\ \amp=\amp \psi_0 \lambda_0 x_0 + \psi_1 \lambda_0 x_1 + \cdots + \psi_{m-1} \lambda_{m-1} x_{m-1}, \\ v^{(2)} = A v^{(1)} \amp=\amp A \left( \psi_0 \lambda_0 x_0 + \psi_1 \lambda_0 x_1 + \cdots + \psi_{m-1} \lambda_{m-1} x_{m-1} \right) \\ \amp=\amp \psi_0 \lambda_0 A x_0 + \psi_1 \lambda_0 A x_1 + \cdots + \psi_{m-1} \lambda_{m-1} A x_{m-1} \\ \amp = \amp \psi_0 \lambda_0^2 x_0 + \psi_1 \lambda_1^2 x_1 + \cdots + \psi_{m-1} \lambda_{m-1}^2 x_{m-1}, \\ \amp \vdots \amp \\ v^{(k)} = A v^{(k-1)} \amp=\amp \psi_0 \lambda_0^k x_0 + \psi_1 \lambda_1^k x_1 + \cdots + \psi_{m-1} \lambda_{m-1}^k x_{m-1}. \end{array} \end{equation*}

Now, as long as \(\psi_0 \neq 0 \) clearly \(\psi_0 \lambda_0^k x_0 \) will eventually dominate since

\begin{equation*} \vert \lambda_i \vert/\vert \lambda_0\vert \lt 1. \end{equation*}

This means that \(v^{(k)} \) will start pointing in the direction of \(x_0 \text{.}\) In other words, it will start pointing in the direction of an eigenvector corresponding to \(\lambda_0 \text{.}\) The problem is that it will become infinitely long if \(\vert \lambda_0 \vert \gt 1 \) or infinitesimally short if \(\vert \lambda_0 \vert \lt 1 \text{.}\) All is good if \(\vert \lambda_0 \vert = 1 \text{.}\)

An alternative way of looking at this is to exploit the fact that the eigenvectors, \(x_i \text{,}\) equal the columns of \(X \text{.}\) Then

\begin{equation*} y = \left( \begin{array}{c} \psi_0 \\ \psi_1 \\ \vdots \\ \psi_{m-1} \end{array} \right) = X^{-1} v^{(0)} \end{equation*}

and

\begin{equation*} \begin{array}{rcl} v^{(0)} = A^0 v^{(0)} \amp = \amp X y \\ v^{(1)} = A v^{(0)} \amp=\amp A X y = X \Lambda y \\ v^{(2)} = A v^{(1)} \amp=\amp A X \Lambda y = X \Lambda^2 y \\ \amp \vdots \amp \\ v^{(k)} = A v^{(k-1)} \amp=\amp A X \Lambda^{k-1} y = X \Lambda^{k} y \\ \end{array} \end{equation*}

Thus

\begin{equation*} \begin{array}{rcl} v^{(k)} \amp = \amp \left( \begin{array}{c c c c} x_0 \amp x_1 \amp \cdots \amp x_{m-1} \end{array} \right) \left( \begin{array}{c c c c} \lambda_0^k \amp 0 \amp \cdots \amp 0 \\ 0 \amp \lambda_1^k \amp \cdots \amp 0 \\ \vdots \amp \vdots \amp \ddots \amp \vdots \\ 0 \amp 0 \amp \cdots \amp \lambda_{m-1}^k \end{array} \right) \left( \begin{array}{c} \psi_0 \\ \psi_1 \\ \vdots \\ \psi_{m-1} \end{array} \right) \\ \amp = \amp \psi_0 \lambda_0^k x_0 + \psi_1 \lambda_1^k x_1 + \cdots + \psi_{m-1} \lambda_{m-1}^k x_{m-1}. \end{array} \end{equation*}

Notice how looking at \(v^{(k)} \) in the right basis (the eigenvectors) simplified the explanation.

Subsubsection 9.3.1.2 Second attempt

Given an initial \(v^{(0)} \in \C^{m} \text{,}\) a second attempt at the Power Method iterates as follows:

\begin{equation*} \begin{array}{l} \mbox{Pick } v^{(0)} \\ {\bf for~} k = 0, \ldots \\ ~~~ v^{(k+1)} = A v^{(k)} / \lambda_0 \\ {\bf endfor} \end{array} \end{equation*}

It is not hard to see that then

\begin{equation*} \begin{array}{rcl} v^{(k)} \amp=\amp A v^{(k-1)} / \lambda_0 = A^k v^{(0)} / \lambda_0^k \\ \amp = \amp \psi_0 \left( \frac{\lambda_0}{\lambda_0} \right) ^k x_0 + \psi_1 \left( \frac{\lambda_1}{\lambda_0} \right) ^k x_1 + \cdots + \psi_{m-1} \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k x_{m-1} \\ \amp = \amp \psi_0 x_0 + \psi_1 \left( \frac{\lambda_1}{\lambda_0} \right) ^k x_1 + \cdots + \psi_{m-1} \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k x_{m-1}. \end{array} \end{equation*}

Clearly \(\lim_{k \rightarrow \infty} v^{(k)} = \psi_0 x_0 \text{,}\) as long as \(\psi_0 \neq 0 \text{,}\) since \(\left\vert \frac{ \lambda_k }{ \lambda_0 } \right\vert \lt 1 \) for \(k \gt 0 \text{.}\)

Another way of stating this is to notice that

\begin{equation*} A^k = \begin{array}[t]{c} \underbrace{( A A \cdots A )} \\ \mbox{ times} \end{array} = \begin{array}[t]{c} \underbrace{( X \Lambda X^{-1}) ( X \Lambda X^{-1}) \cdots ( X \Lambda X^{-1} )} \\ \Lambda^k \end{array} = X \Lambda^k X^{-1}. \end{equation*}

so that

\begin{equation*} \begin{array}{rcl} v^{(k)} \amp=\amp A^k v^{(0)} / \lambda_0^k \\ \amp=\amp A^k X y / \lambda_0^k \\ \amp=\amp X \Lambda^k X^{-1} X y / \lambda_0^k \\ \amp=\amp X \Lambda^k y / \lambda_0^k \\ \amp=\amp X \left( \Lambda^k / \lambda_0^k \right) y = X \left( \begin{array}{ c | c | c | c } 1 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^k \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k \end{array} \right) y. \end{array} \end{equation*}

Now, since \(\left\vert \frac{\lambda_{k}}{\lambda_0} \right\vert \lt 1 \) for \(k \gt 1 \) we can argue that

\begin{equation*} \begin{array}{l} \lim_{k \rightarrow \infty} v^{(k)} \\ ~~~= ~~~~\\ \lim_{k \rightarrow \infty} X \left( \begin{array}{ c | c | c | c } 1 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp \left( \frac{\lambda_1}{\lambda_0} \right)^k \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp \left( \frac{\lambda_{m-1}}{\lambda_0} \right)^k \end{array} \right) y \\ ~~~= ~~~~\\ X \left( \begin{array}{ c | c | c | c } 1 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp 0 \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp 0 \end{array} \right) y \\ ~~~=~~~~ \\ X \psi_0 e_0 \\ ~~~= ~~~~\\ \psi_0 X e_0 = \psi_0 x_0. \end{array} \end{equation*}

Thus, as long as \(\psi_0 \neq 0 \) (which means \(v^{(0)} \) must have a component in the direction of \(x_0 \)) this method will eventually yield a vector in the direction of \(x_0 \text{.}\) However, this time the problem is that we don't know \(\lambda_0 \) when we start.

Subsubsection 9.3.1.3 A practical Power Method

The following algorithm, known as the Power Method, avoids the problem of \(v^{(k)} \) growing or shrinking in length without requiring \(\lambda_0 \) to be known, by scaling it to be of unit length at each step:

\begin{equation*} \begin{array}{l} \mbox{Pick } v^{(0)} \mbox{ of unit length} \\ {\bf for~} k = 0, \ldots \\ ~~~v^{(k+1)} = A v^{(k)} \\ ~~~ v^{(k+1)} = v^{(k+1)} / \| v^{(k+1)} \| \\ {\bf endfor} \end{array} \end{equation*}

The idea is that we are only interested in the direction of the eigenvector, and hence it is convenient to rescale the vector to have unit length at each step.

Subsubsection 9.3.1.4 The Rayleigh quotient

A question is how to extract an approximation of \(\lambda_0 \) given an approximation of \(x_0 \text{.}\) The following insights provide the answer:

Definition 9.3.1.1. Rayleigh quotient.

If \(A \in \Cmxm \) and \(x \neq 0 \in \Cm \) then

\begin{equation*} \frac{ x^H A x }{x^H x} \end{equation*}

is known as the Rayleigh quotient.

Homework 9.3.1.1.

Let \(x \) be an eigenvector of \(A \text{.}\)

ALWAYS/SOMETIMES/NEVER: \(\lambda = x^H A x / (x^H x) \) is the associated eigenvalue of \(A \text{.}\)

Answer

ALWAYS

Now prove it!

Solution

Let \(x \) be an eigenvector of \(A \) and \(\lambda \) the associated eigenvalue. Then \(A x = \lambda x \text{.}\) Multiplying on the left by \(x^H \) yields \(x^H A x = \lambda x^H x \) which, since \(x \neq 0 \) means that \(\lambda = x^H A x / ( x^H x ) \text{.}\)

If \(x \) is an approximation of the eigenvector \(x_0 \) associated with \(\lambda_0 \text{,}\) then its Rayleigh quotient is an approximation to \(\lambda_0 \text{.}\)