## Unit8.5.2Summary

Given a function $f: \mathbb R^n \rightarrow \mathbb R \text{,}$ its gradient is given by

\begin{equation*} \nabla f( x ) = \left( \begin{array}{c} \frac{\partial f}{\partial \chi_0}( x ) \\ \frac{\partial f}{\partial \chi_1}( x ) \\ \vdots \\ \frac{\partial f}{\partial \chi_{n-1}}( x ) \end{array} \right). \end{equation*}

$\nabla f( x )$ equals the direction in which the function $f$ increases most rapidly at the point $x$ and $- \nabla f( x )$ equals the direction of steepest descent (the direction in which the function $f$ decreases most rapidly at the point $x$).

In this summary, we will assume that $A \in \mathbb R^{n \times n}$ is symmetric positive definite (SPD) and

\begin{equation*} f( x ) = \frac{1}{2} x^T A x - x^T b. \end{equation*}

The gradient of this function equals

\begin{equation*} \nabla f( x ) = A x - b \end{equation*}

and $\widehat x$ minimizes the function if and only if

\begin{equation*} A \widehat x = b . \end{equation*}

If $x^{(k)}$ is an approximation to $\widehat x$ then $r^{(k)} = b - A x^{(k)}$ equals the corresponding residual. Notice that $r^{(k)} = -\nabla f( x^{(k)})$ and hence $r^{(k)}$ is the direction of steepest descent .

A prototypical descent method is given by

\begin{equation*} \begin{array}{l} {\bf Given:} A, b, x^{(0)} \\ r^{(0)} := b - A x^{(0)} \\ k := 0 \\ {\bf while~} r^{(k)} \neq 0 \\ ~~~ p^{(k)} := \mbox{ next direction } \\ ~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)} \mbox{ for some scalar } \alpha_k \\ ~~~ r^{(k+1)} := b - A x^{(k+1)} \\ ~~~ k := k+1 \\ {\bf endwhile} \end{array} \end{equation*}

Here $p^{(k)}$ is the "current" search direction and in each iteration we create the next approximation to $\widehat x \text{,}$ $x^{(k+1)} \text{,}$ along the line $x^{(k)} + \alpha p^{(k)} \text{.}$

If $x^{(k+1)}$ minimizes along that line, the method is an exact descent method and

\begin{equation*} \alpha_k = \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} A p^{(k)}} \end{equation*}

so that a prototypical exact descent method is given by

\begin{equation*} \begin{array}{l} {\bf Given:~} A, b, x^{(0)} \\ r^{(0)} := b - A x^{(0)} \\ k := 0 \\ {\bf while~} r^{(k)} \neq 0 \\ ~~~ p^{(k)} := \mbox{ next direction } \\ ~~~ \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} A p^{(k)}} \\ ~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)} \\ ~~~ r^{(k+1)} := b - A x^{(k+1)} \\ ~~~k := k+1 \\ {\bf endwhile} \end{array} \end{equation*}

Once $\alpha_k$ is determined,

\begin{equation*} r^{(k+1)} = r^{(k)} - \alpha_k A p^{(k)}. \end{equation*}

which saves a matrix-vector multiplication when incorporated into the prototypical exact descent method:

\begin{equation*} \begin{array}{l} {\bf Given:~} A, b, x^{(0)} \\ r^{(0)} := b - A x^{(0)} \\ k := 0 \\ {\bf while~} r^{(k)} \neq 0 \\ ~~~ p^{(k)} := \mbox{ next direction } \\ ~~~ q^{(k)} := A p^{(k)} \\ ~~~ \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} q^{(k)}} \\ ~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)} \\ ~~~ r^{(k+1)} := r^{(k)} - \alpha_k q^{(k)} \\ ~~~k := k+1 \\ {\bf endwhile} \end{array} \end{equation*}

The steepest descent algorithm chooses $p^{(k)} = - \nabla f( x^{(k)} ) = b - A x^{(k)} = r^{(k)} \text{:}$

\begin{equation*} \begin{array}{l} {\bf Given:~} A, b, x^{(0)} \\ r^{(0)} := b - A x^{(0)} \\ k := 0 \\ {\bf while~} r^{(k)} \neq 0 \\ ~~~ p^{(k)} := r^{(k)} \\ ~~~ q^{(k)} := A p^{(k)} \\ ~~~ \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} q^{(k)}} \\ ~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)} \\ ~~~ r^{(k+1)} := r^{(k)} - \alpha_k q^{(k)} \\ ~~~k := k+1 \\ {\bf endwhile} \end{array} \end{equation*}

Convergence can be greatly accelerated by incorporating a preconditioner, $M \text{,}$ where, ideally, $M \approx A$ is SPD and solving $M z = y$ is easy (cheap).

\begin{equation*} \begin{array}{l} {\bf Given:~} A, b, x^{(0)}, M \\ r^{(0)} := b - A x^{(0)} \\ k := 0 \\ {\bf while~} r^{(k)} \neq 0 \\ ~~~ p^{(k)} := M^{-1} r^{(k)} \\ ~~~ q^{(k)} := A p^{(k)} \\ ~~~ \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} q^{(k)}} \\ ~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)} \\ ~~~ r^{(k+1)} := r^{(k)} - \alpha_k q^{(k)} \\ ~~~k := k+1 \\ {\bf endwhile} \end{array} \end{equation*}
###### Definition8.5.2.1.A-conjugate directions.

Let $A$ be SPD. A sequence $p^{(0)}, \ldots, p^{(k-1)} \in \Rn$ such that $p^{(j)\,T} A p^{(i)} = 0$ if and only if $j \neq i$ is said to be A-conjugate.

The columns of $P \in \R^{n \times k}$ are A-conjugate if and only if $P^T A P = D$ where $D$ is diagonal and has positive values on its diagonal.

A-conjugate vectors are linearly independent.

A descent method that chooses the search directions to be A-conjugate will find the solution of $A x = b \text{,}$ where $A \in \mathbb R^{n \times n}$ is SPD, in at most $n$ iterations:

\begin{equation*} \begin{array}{l} {\bf Given:~} A, b \\ x^{(0)} := 0 \\ r^{(0)} = b \\ k := 0 \\ {\bf while~} r^{(k)} \neq 0 \\ ~~~ \mbox{Choose } p^{(k)} \mbox{ such that } p^{(k)\,T} A P^{(k-1)} = 0 \mbox{ and } p^{(k)\,T} r^{(k)} \neq 0 \\ % ~~~ A p^{(k)} := A p^{(k)} \\ ~~~ \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} A p^{(k)}} \\ ~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)} \\ ~~~ r^{(k+1)} := r^{(k)} - \alpha_k A p^{(k)} \\ ~~~ k:= k+1 \\ {\bf endwhile} \end{array} \end{equation*}

The Conjugate Gradient Method chooses the search direction to equal the vector $p^{(k)}$ that is A-conjugate to all previous search directions and is closest to the direction of steepest descent:

\begin{equation*} \begin{array}{l} {\bf Given:~} A, b \\ x^{(0)} := 0 \\ r^{(0)} := b \\ k := 0 \\ {\bf while~} r^{(k)} \neq 0 \\ ~~~ {\bf if~} k = 0 \\ ~~~ ~~~ p^{(k)} = r^{(0)} \\ ~~~ {\bf else} \\ ~~~ ~~~ p^{(k)} \mbox{ minimizes } \min_{p \perp \Span( Ap^{(0)}, \ldots, Ap^{(k-1)} )} \| r^{(k)} - p \|_2 \\ ~~~ {\bf endif} \\ ~~~ \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} A p^{(k)}} \\ ~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)} \\ ~~~ r^{(k+1)} := r^{(k)} - \alpha_k A p^{(k)} \\ {\bf endwhile} \end{array} \end{equation*}

The various vectors that appear in the Conjugate Gradient Method have the following properties: If $P^{(p-1)} = \left( \begin{array}{c c c} p^{(0)} \amp \cdots \amp p^{(k-1)} \end{array} \right)$ then

• $\displaystyle P^{(k-1)\,T} r^{(k)} = 0.$

• $\displaystyle \Span( p^{(0)}, \ldots, p^{(k-1)} ) = \Span( r^{(0)}, \ldots, r^{(k-1)} ) = \Span( b, A b, \ldots, A^{k-1} b ).$
• The residual vectors $r^{(k)}$ are mutually orthogonal.

• For $k \geq 1$

\begin{equation*} p^{(k)} = r^{(k)} - \gamma_{k} p^{(k-1)} \end{equation*}
###### Definition8.5.2.2.Krylov subspace.

The subspace

\begin{equation*} {\cal K}_k( A, b ) = \Span( b, A b, \ldots, A^{k-1} b ) \end{equation*}

is known as the order-k Krylov subspace.

Alternative Conjugate Gradient Methods are given by

\begin{equation*} \begin{array}{l} {\bf Given:~} A, b \\ x^{(0)} := 0 \\ r^{(0)} := b \\ k := 0 \\ {\bf while~} r^{(k)} \neq 0 \\ ~~~ {\bf if~} k = 0 \\ ~~~ ~~~ p^{(k)} = r^{(0)} \\ ~~~ {\bf else} \\ ~~~ ~~~ \gamma_k := -(p^{(k-1)\,T} A r^{(k)} ) /( p^{(k-1)\,T} A p^{(k-1)} )\\ ~~~ ~~~ p^{(k)} := r^{(k)} + \gamma_k p^{(k-1)} \\ ~~~ {\bf endif} \\ ~~~ \alpha_k := \frac{r^{(k)\,T} r^{(k)}}{p^{(k)\,T} A p^{(k)}} \\ ~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)} \\ ~~~ r^{(k+1)} := r^{(k)} - \alpha_k A p^{(k)} \\ ~~~ k := k + 1 \\ {\bf endwhile} \end{array} \end{equation*}
\begin{equation*} \begin{array}{l} {\bf Given:~} A, b \\ x^{(0)} := 0 \\ r^{(0)} := b \\ k := 0 \\ {\bf while~} r^{(k)} \neq 0 \\ ~~~ {\bf if~} k = 0 \\ ~~~ ~~~ p^{(k)} = r^{(0)} \\ ~~~ {\bf else} \\ ~~~ ~~~ \gamma_k := (r^{(k)\,T} r^{(k)}) / (r^{(k-1)\,T} r^{(k-1)}) \\ ~~~ ~~~ p^{(k)} := r^{(k)} + \gamma_k p^{(k-1)} \\ ~~~ {\bf endif} \\ ~~~ \alpha_k := \frac{r^{(k)\,T} r^{(k)}}{p^{(k)\,T} A p^{(k)}} \\ ~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)} \\ ~~~ r^{(k+1)} := r^{(k)} - \alpha_k A p^{(k)} \\ ~~~ k := k + 1 \\ {\bf endwhile} \end{array} \end{equation*}

A practical stopping criteria for the Conjugate Gradient Method is to proceed while $\| r^{(k)} \|_2 \leq \meps \| b \|_2$ and some maximum number of iterations is not yet performed.

The Conjugate Gradient Method can be accelerated by incorporating a preconditioned, $M \text{,}$ where $M \approx A$ is SPD.