The idea behind the Conjugate Gradient Method is that in the current iteration we have an approximation, $x^{(k)}$ to the solution to $A x = b \text{.}$ By construction, since $x^{(0)} = 0 \text{,}$

\begin{equation*} x^{(k)} = \alpha_0 p^{(0)} + \cdots + \alpha_{k-1} p^{(k-1)} . \end{equation*}

Also, the residual

\begin{equation*} \begin{array}{l} r^{(k)} \\ ~~~=~~~~ \\ b - A x^{(k)} \\ ~~~=~~~~ \\ b - A ( \alpha_0 p^{(0)} + \cdots + \alpha_{k-1} p^{(k-1)} ) \\ ~~~=~~~~ \\ b - \alpha_0 A p^{(0)} - \cdots - \alpha_{k-1} A p^{(k-1)} \\ ~~~=~~~~ \\ r^{(k-1)} - \alpha_{k-1} A p^{(k-1)}. \end{array} \end{equation*}

If $r^{(k)} = 0 \text{,}$ then we know that $x^{(k)}$ solves $A x = b \text{,}$ and we are done.

Assume that $r^{(k)} \neq 0 \text{.}$ The question now is "How should we construct a new $p^{(k)}$ that is A-conjugate to the previous search directions and so that $p^{(k)\,T} r^{(k)} \neq 0 \text{?}$" Here are some thoughts:

• We like the direction of steepest descent, $r^{(k)} = b - A x^{(k)} \text{,}$ because it is the direction in which $f( x )$ decreases most quickly.

• Let us chose $p^{(k)}$ to be the vector that is A-conjugate to $p^{(0)}, \ldots , p^{(k-1)}$ and closest to the direction of steepest descent, $r^{(k)} \text{:}$

\begin{equation*} \| p^{(k)} - r^{(k)} \|_2 = \min_{p \perp \Span( Ap^{(0)}, \ldots, Ap^{(k-1)} )} \| r^{(k)} - p \|_2. \end{equation*}

This yields the algorithm in Figure 8.3.3.1.