## Subsection8.3.1A-conjugate directions

Let's start our generic descent method algorithm with $x^{(0)} = 0 \text{.}$ Here we do not use the temporary vector $q^{(k)} = A p^{(k)}$ so that later we can emphasize how to cast the Conjugate Gradient Method in terms of as few matrix-vector multiplication as possible (one to be exact).

Now, since $x^{(0)} = 0 \text{,}$ clearly

\begin{equation*} x^{(k+1)} = \alpha_0 p^{(0)} + \cdots + \alpha_{k} p^{(k)} . \end{equation*}

Thus, $x^{(k+1)} \in \Span( p^{(0)}, \ldots , p^{(k)} ) \text{.}$

It would be nice if after the $k$th iteration

\begin{equation} f( x^{(k+1)} ) = \min_{x \in \Span( p^{(0)}, \ldots , p^{(k)} )} f( x )\label{chapter08-eqn-CG1}\tag{8.3.1} \end{equation}

and the search directions were linearly independent. Then, the resulting descent method, in exact arithmetic, is guaranteed to complete in at most $n$ iterations, This is because then

\begin{equation*} \Span( p^{(0)}, \ldots , p^{(n-1)} ) = \R^n \end{equation*}

so that

\begin{equation*} f( x^{(n)} ) = \min_{x \in \Span( p^{(0)}, \ldots , p^{(n-1)} )} f( x ) = \min_{x \in \R^n} f( x ) \end{equation*}

and hence $A x^{(n)} = b \text{.}$

Unfortunately, the Method of Steepest Descent does not have this property. The next approximation to the solution, $x^{(k+1)}$ minimizes $f( x )$ where $x$ is constrained to be on the line $x^{(k)} + \alpha p^{(k)} \text{.}$ Because in each step $f( x^{(k+1)} ) \leq f( x^{(k)} ) \text{,}$ a slightly stronger result holds: It also minimizes $f( x )$ where $x$ is constrained to be on the union of lines $x^{(j)} + \alpha p^{(j)} \text{,}$ $j = 0, \ldots, k \text{.}$ However, unless we pick the search directions very carefully, that is not the same as it minimizing over all vectors in $\Span( p^{(0)}, \ldots , p^{(k)} ) \text{.}$

We can write (8.3.1) more concisely: Let

\begin{equation*} P^{(k-1)} = \left( \begin{array}{c c c c} p^{(0)}\amp p^{(1)}\amp \cdots \amp p^{(k-1)} \end{array} \right) \end{equation*}

be the matrix that holds the history of all search directions so far (as its columns) . Then, letting

\begin{equation*} a^{(k-1)} = \left( \begin{array}{c} \alpha_0 \\ \vdots \\ \alpha_{k-1} \end{array} \right), \end{equation*}

we notice that

\begin{equation} x^{(k)} = \left( \begin{array}{c c c c} p^{(0)}\amp \cdots \amp p^{(k-1)} \end{array} \right) \left( \begin{array}{c} \alpha_0 \\ \vdots \\ \alpha_{k-1} \end{array} \right) = P^{(k-1)} a_{k-1}.\label{chapter08-Pa}\tag{8.3.2} \end{equation}
###### Homework8.3.1.1.

Let $p^{(k)}$ be a new search direction that is linearly independent of the columns of $P^{(k-1)} \text{,}$ which themselves are linearly independent. Show that

\begin{equation*} \begin{array}{l} \min_{x \in \Span( p^{(0)}, \ldots , p^{(k-1)}, p^{(k)})} f( x ) = \min_y f( P^{(k)} y ) \\ \\ ~~~~~ = \min_{y } \left[ \frac{1}{2} y_0^T P^{(k-1)\,T} A P^{(k-1)} y_0 - y_0^T P^{(k-1)\,T} b \right. \\ ~~~~~~~~~~~~~ \left. + \psi_1 y_0^T P^{(k-1)\,^T} A p^{(k)} + \frac{1}{2} \psi_1^2 p^{(k)\,T} A p^{(k)} - \psi_1 p^{(k)\,T} b \right], \end{array} \end{equation*}

where $y = \left( \begin{array}{c} y_0 \\ \hline \psi_1 \end{array} \right) \in \R^{k+1} \text{.}$

Hint
\begin{equation*} x \in \Span( p^{(0)}, \ldots , p^{(k-1)}, p^{(k)} ) \end{equation*}

if and only if there exists

\begin{equation*} y = \left( \begin{array}{c} y_0 \\ \psi_1 \end{array} \right) \in \R^{k+1} \mbox{ such that } x = \left( \begin{array}{c|c} P^{(k-1)} \amp p^{(k)} \end{array} \right) \left( \begin{array}{c} y_0 \\ \hline \psi_1 \end{array} \right) . \end{equation*}
Solution
\begin{equation*} \begin{array}{l} \min_{x \in \Span( p^{(0)}, \ldots , p^{(k-1)}, p^{(k)})} f( x ) \\ ~~~=~~~~ \lt \mbox{ equivalent formulation } \gt \\ \min_y f( \left( \begin{array}{c | c} P^{(k-1)} \amp p^{(k)} \end{array} \right) y ) \\ ~~~=~~~~ \lt \mbox{ partition } y = \left( \begin{array}{c} y_0 \\ \hline \psi_1 \end{array} \right) \gt \\ \min_{y} f( \FlaOneByTwo{ P^{(k-1)} }{ p^{(k)} } \FlaTwoByOne{ y_0 }{ \psi_1 } ) \\ ~~~=~~~~ \lt \mbox{ instantiate } f \gt \\ \min_{y} \left[ \frac{1}{2} \left[ \FlaOneByTwo{ P^{(k-1)} }{ p^{(k)} } \FlaTwoByOne{ y_0 }{ \psi_1 } \right] ^T A \FlaOneByTwo{ P^{(k-1)} }{ p^{(k)} } \FlaTwoByOne{ y_0 }{ \psi_1 } \right. \\ ~~~~~~~~~~~~~~ \left. - \left[ \FlaOneByTwo{ P^{(k-1)} }{ p^{(k)} } \FlaTwoByOne{ y_0 }{ \psi_1 } \right]^T b \right]. \\ ~~~=~~~~ \lt \mbox{ multiply out } \gt \\ \min_{y} \left[ \frac{1}{2} \left[ y_0^T P^{(k-1)\,T} + \psi_1 p^{(k)\,T} \right] A \left[ P^{(k-1)} y_0 + \psi_1 p^{(k)} \right] - y_0^T P^{(k-1)\,T} b - \psi_1 p^{(k)\,T} b \right] \\ ~~~=~~~~ \lt \mbox{ multiply out some more } \gt \\ \min_{y} \left[ \frac{1}{2} y_0^T P^{(k-1)\,T} A P^{(k-1)} y_0 + \psi_1 y_0^T P^{(k-1)\,T} A p^{(k)} \right. \\ ~~~~~~~~ \left. + \frac{1}{2} \psi_1^2 p^{(k)\,T} A p^{(k)} - y_0^T P^{(k-1)\,^T} b - \psi_1 p^{(k)\,T} b \right] \\ ~~~=~~~~\lt \mbox{ rearrange } \gt \\ \min_{y} \left[ \frac{1}{2} y_0^T P^{(k-1)\,T} A P^{(k-1)} y_0 - y_0^T P^{(k-1)\,T} b + \psi_1 y_0^T P^{(k-1)\,^T} A p^{(k)} \right. \\ ~~~~~~~~ \left. + \frac{1}{2} \psi_1^2 p^{(k)\,T} A p^{(k)} - \psi_1 p^{(k)\,T} b \right]. \end{array} \end{equation*}

Now, if

\begin{equation*} P^{(k-1)\,T} A p^{(k)} = 0 \end{equation*}

then

\begin{equation*} \begin{array}{l} \min_{x \in \Span( p^{(0)}, \ldots , p^{(k-1)}, p^{(k)} )} f( x ) \\ ~~~=~~~~ \lt \mbox{ from before } \gt \\ \min_{y} \left[ \frac{1}{2} y_0^T P^{(k-1)\,T} A P^{(k-1)} y_0 - y_0^T P^{(k-1)\,^T} b \right. \\ ~~~~~~~~ + \begin{array}[t]{c} \underbrace{ \psi_1 y_0^T P^{(k-1)\,T} A p^{(k)} } \\ 0 \end{array} + \left. \frac{1}{2} \psi_1^2 p^{(k)\,T} A p^{(k)} - \psi_1 p^{(k)\,T} b \right] \\ ~~~=~~~~ \lt \mbox{ remove zero term } \gt \\ \min_{y} \left[ \frac{1}{2} y_0^T P^{(k-1)\,T} A P^{(k-1)} y_0 - y_0^T P^{(k-1)\,^T} b \right. \\ ~~~~~~~~ \left. \phantom{ + \psi_1 y_0^T P^{(k-1)\,T} A p^{(k)} } + \frac{1}{2} \psi_1^2 p^{(k)\,T} A p^{(k)} - \psi_1 p^{(k)\,T} b \right]\\ ~~~=~~~~ \lt \mbox{ split into two terms that can be minimized separately } \gt \\ \min_{y_0} \left[ \frac{1}{2} y_0^T P^{(k-1)\,T} A P^{(k-1)} y_0 - y_0^T P^{(k-1)\,^T} b \right] + \min_{\psi_1} \left[ \frac{1}{2} \psi_1^2 p^{(k)\,T} A p^{(k)} - \psi_1 p^{(k)\,T} b \right] \\ ~~~=~~~~ \lt \mbox{ recognize first set of terms as } f( P^{(k-1)} y_0 ) \gt \\ \min_{x \in \Span( p^{(0)}, \ldots , p^{(k-1)} )} f( x )+ \min_{\psi_1} \left[ \frac{1}{2} \psi_1^2 p^{(k)\,T} A p^{(k)} - \psi_1 p^{(k)\,T} b \right]. \end{array} \end{equation*}

The minimizing $\psi_1$ is given by

\begin{equation*} \psi_1 = \frac{p^{(k)\,T} b}{p^{(k)\,T} A p^{(k)} } . \end{equation*}

If we pick $p^{(k)} = p^{(k)}$ and $\alpha_k = \psi_1$ then

\begin{equation*} x^{(k+1)} =P^{(k-1)} y_0 + \psi_1 p^{(k)} = \alpha_0 p^{(0)} + \cdots + \alpha_{k-1} p^{(k-1)} + \alpha_k p^{(k)} = x^{(k)} + \alpha_k p^{(k)}. \end{equation*}

A sequence of such directions is said to be A-conjugate.

###### Definition8.3.1.2. A-conjugate directions.

Let $A$ be SPD. A sequence $p^{(0)}, \ldots, p^{(k-1)} \in \Rn$ such that $p^{(j)\,T} A p^{(i)} = 0$ if and only if $j \neq i$ is said to be A-conjugate.

###### Homework8.3.1.2.

Let $A \in \R^{n \times n}$ be SPD.

ALWAYS/SOMETIMES/NEVER: The columns of $P \in \R^{n \times k}$ are A-conjugate if and only if $P^T A P = D$ where $D$ is diagonal and has positive values on its diagonal.

ALWAYS

Now prove it.

Solution
\begin{equation*} \begin{array}{l} P^T A P \\ ~~~ = ~~~~ \lt \mbox{ partition } P \mbox{ by columns }\gt \\ \left( \begin{array}{c | c | c} p_0 \amp \cdots \amp p_{k-1} \end{array} \right)^T A \left( \begin{array}{c | c | c} p_0 \amp \cdots \amp p_{k-1} \end{array} \right) \\ ~~~ = ~~~~ \lt \mbox{ transpose } \gt \\ \left( \begin{array}{c} p_0^T \\ \vdots \\ p_{k-1}^T \end{array} \right) A \left( \begin{array}{c | c | c} p_0 \amp \cdots \amp p_{k-1} \end{array} \right) \\ ~~~ = ~~~~ \lt \mbox{ multiply out } \gt \\ \left( \begin{array}{c} p_0^T \\ \vdots \\ p_{k-1}^T \end{array} \right) \left( \begin{array}{c | c | c} A p_0 \amp \cdots \amp A p_{k-1} \end{array} \right) \\ ~~~ = ~~~~ \lt \mbox{ multiply out } \gt \\ \left( \begin{array}{c | c | c | c } p_0^T A p_0 \amp p_0^T A p_1 \amp \cdots \amp p_0^T A p_{k-1} \\ \hline p_1^T A p_0 \amp p_1^T A p_1 \amp \cdots \amp p_1^T A p_{k-1} \\ \hline \vdots \amp \amp \vdots \\ \hline p_{k-1}^T A p_0 \amp p_{k-1}^T A p_1 \amp \cdots \amp p_{k-1}^T A p_{k-1} \end{array} \right) \\ ~~~ = ~~~~ \lt A = A^T \gt \\ \left( \begin{array}{c | c | c | c } p_0^T A p_0 \amp p_1^T A p_0 \amp \cdots \amp p_{k-1}^T A p_0 \\ \hline p_1^T A p_0 \amp p_1^T A p_1 \amp \cdots \amp p_{k-1}^T A p_1 \\ \hline \vdots \amp \amp \vdots \\ \hline p_{k-1}^T A p_0 \amp p_{k-1}^T A p_1 \amp \cdots \amp p_{k-1}^T A p_{k-1} \end{array} \right) \end{array} \end{equation*}

Now, if the columns of $P$ are A-conjugate, then

\begin{equation*} \begin{array}{l} \left( \begin{array}{c | c | c | c } p_0^T A p_0 \amp p_1^T A p_0 \amp \cdots \amp p_{k-1}^T A p_0 \\ \hline p_1^T A p_0 \amp p_1^T A p_1 \amp \cdots \amp p_{k-1}^T A p_1 \\ \hline \vdots \amp \amp \vdots \\ \hline p_{k-1}^T A p_0 \amp p_{k-1}^T A p_1 \amp \cdots \amp p_{k-1}^T A p_{k-1} \end{array} \right) ~~~ = ~~~~ \lt \mbox{ multiply out } \gt \\ \left( \begin{array}{c | c | c | c} p_0^T A p_0 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp p_1^T A p_1 \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp p_{k-1}^T A p_{k-1} \end{array} \right), \end{array} \end{equation*}

and hence $P^T A P$ is diagonal.

If, on the other hand, $P^T A P$ is diagonal, then the columns of $P$ are A-conjugate.

###### Homework8.3.1.3.

Let $A \in \R^{n \times n}$ be SPD and the columns of $P \in \R^{n \times k}$ be A-conjugate.

ALWAYS/SOMETIMES/NEVER: The columns of $P$ are linearly independent.

ALWAYS

Now prove it!

Solution

We employ a proof by contradiction. Suppose the columns of $P$ are not linearly independent. Then there exists $y \neq 0$ such that $P y = 0 \text{.}$ Let $D = P^T A P \text{.}$ From the last homework we know that $D$ is diagonal and has positive diagonal elements. But then

\begin{equation*} \begin{array}{l} 0 \\ ~~~=~~~~ \lt P y = 0 \gt \\ ( P y )^T A ( P y ) \\ ~~~=~~~~ \lt \mbox{ multiply out } \gt \\ y^T P^T A P y \\ ~~~=~~~~ \lt P^T A P = D \gt \\ y^T D y \\ ~~~\gt ~~~~ \lt D \mbox{ is SPD} \gt \\ 0, \end{array} \end{equation*}

which is a contradiction. Hence, the columns of $P$ are linearly independent.

The above observations leaves us with a descent method that picks the search directions to be A-conjugate, given in Figure 8.3.1.3.

###### Remark8.3.1.4.

The important observation is that if $p^{(0)}, \ldots , p^{(k)}$ are chosen to be A-conjugate, then $x^{(k+1)}$ minimizes not only

\begin{equation*} f( x^{(k)} + \alpha p^{(k)} ) \end{equation*}

but also

\begin{equation*} \min_{x \in \Span( p^{(0)}, \ldots , p^{(k-1)} )} f( x ). \end{equation*}