Skip to main content

Subsection 8.3.1 A-conjugate directions

Let's start our generic descent method algorithm with \(x^{(0)} = 0 \text{.}\) Here we do not use the temporary vector \(q^{(k)} = A p^{(k)} \) so that later we can emphasize how to cast the Conjugate Gradient Method in terms of as few matrix-vector multiplication as possible (one to be exact).

\begin{equation*} \begin{array}{l} {\bf Given:~} A, b \\ x^{(0)} := 0 \\ r^{(0)} := b - A x^{(0)} (= b) \\ k := 0 \\ {\bf while~} r^{(k)} \neq 0 \\ ~~~ p^{(k)} := \mbox{next direction}\\ ~~~ \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} A p^{(k)}} \\ ~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)} \\ ~~~ r^{(k+1)} := r^{(k)} - \alpha A p^{(k)} \\ ~~~ k:= k+1 \\ {\bf endwhile} \end{array} \end{equation*}
\begin{equation*} \begin{array}{l} {\bf Given:~} A, b \\ x := 0 \\ r := b \\ \phantom{k:= 0} \\ {\bf while~} r \neq 0 \\ ~~~ p := \mbox{next direction}\\ ~~~ \alpha := \frac{p^T r}{p^T A p} \\ ~~~ x := x + \alpha p \\ ~~~ r := r - \alpha A p \\ \phantom{k:= k+1} \\ {\bf endwhile} \end{array} \end{equation*}
Figure 8.3.1.1. Generic descent algorithm started with \(x^{(0)} = 0 \text{.}\) Left: with indices. Right: without indices.

Now, since \(x^{(0)} = 0 \text{,}\) clearly

\begin{equation*} x^{(k+1)} = \alpha_0 p^{(0)} + \cdots + \alpha_{k} p^{(k)} . \end{equation*}

Thus, \(x^{(k+1)} \in \Span( p^{(0)}, \ldots , p^{(k)} ) \text{.}\)

It would be nice if after the \(k \)th iteration

\begin{equation} f( x^{(k+1)} ) = \min_{x \in \Span( p^{(0)}, \ldots , p^{(k)} )} f( x )\label{chapter08-eqn-CG1}\tag{8.3.1} \end{equation}

and the search directions were linearly independent. Then, the resulting descent method, in exact arithmetic, is guaranteed to complete in at most \(n \) iterations, This is because then

\begin{equation*} \Span( p^{(0)}, \ldots , p^{(n-1)} ) = \R^n \end{equation*}

so that

\begin{equation*} f( x^{(n)} ) = \min_{x \in \Span( p^{(0)}, \ldots , p^{(n-1)} )} f( x ) = \min_{x \in \R^n} f( x ) \end{equation*}

and hence \(A x^{(n)} = b \text{.}\)

Unfortunately, the Method of Steepest Descent does not have this property. The next approximation to the solution, \(x^{(k+1)} \) minimizes \(f( x ) \) where \(x \) is constrained to be on the line \(x^{(k)} + \alpha p^{(k)} \text{.}\) Because in each step \(f( x^{(k+1)} ) \leq f( x^{(k)} ) \text{,}\) a slightly stronger result holds: It also minimizes \(f( x ) \) where \(x \) is constrained to be on the union of lines \(x^{(j)} + \alpha p^{(j)} \text{,}\) \(j = 0, \ldots, k \text{.}\) However, unless we pick the search directions very carefully, that is not the same as it minimizing over all vectors in \(\Span( p^{(0)}, \ldots , p^{(k)} ) \text{.}\)

We can write (8.3.1) more concisely: Let

\begin{equation*} P^{(k-1)} = \left( \begin{array}{c c c c} p^{(0)}\amp p^{(1)}\amp \cdots \amp p^{(k-1)} \end{array} \right) \end{equation*}

be the matrix that holds the history of all search directions so far (as its columns) . Then, letting

\begin{equation*} a^{(k-1)} = \left( \begin{array}{c} \alpha_0 \\ \vdots \\ \alpha_{k-1} \end{array} \right), \end{equation*}

we notice that

\begin{equation} x^{(k)} = \left( \begin{array}{c c c c} p^{(0)}\amp \cdots \amp p^{(k-1)} \end{array} \right) \left( \begin{array}{c} \alpha_0 \\ \vdots \\ \alpha_{k-1} \end{array} \right) = P^{(k-1)} a_{k-1}.\label{chapter08-Pa}\tag{8.3.2} \end{equation}
Homework 8.3.1.1.

Let \(p^{(k)} \) be a new search direction that is linearly independent of the columns of \(P^{(k-1)} \text{,}\) which themselves are linearly independent. Show that

\begin{equation*} \begin{array}{l} \min_{x \in \Span( p^{(0)}, \ldots , p^{(k-1)}, p^{(k)})} f( x ) = \min_y f( P^{(k)} y ) \\ \\ ~~~~~ = \min_{y } \left[ \frac{1}{2} y_0^T P^{(k-1)\,T} A P^{(k-1)} y_0 - y_0^T P^{(k-1)\,T} b \right. \\ ~~~~~~~~~~~~~ \left. + \psi_1 y_0^T P^{(k-1)\,^T} A p^{(k)} + \frac{1}{2} \psi_1^2 p^{(k)\,T} A p^{(k)} - \psi_1 p^{(k)\,T} b \right], \end{array} \end{equation*}

where \(y = \left( \begin{array}{c} y_0 \\ \hline \psi_1 \end{array} \right) \in \R^{k+1} \text{.}\)

Hint
\begin{equation*} x \in \Span( p^{(0)}, \ldots , p^{(k-1)}, p^{(k)} ) \end{equation*}

if and only if there exists

\begin{equation*} y = \left( \begin{array}{c} y_0 \\ \psi_1 \end{array} \right) \in \R^{k+1} \mbox{ such that } x = \left( \begin{array}{c|c} P^{(k-1)} \amp p^{(k)} \end{array} \right) \left( \begin{array}{c} y_0 \\ \hline \psi_1 \end{array} \right) . \end{equation*}
Solution
\begin{equation*} \begin{array}{l} \min_{x \in \Span( p^{(0)}, \ldots , p^{(k-1)}, p^{(k)})} f( x ) \\ ~~~=~~~~ \lt \mbox{ equivalent formulation } \gt \\ \min_y f( \left( \begin{array}{c | c} P^{(k-1)} \amp p^{(k)} \end{array} \right) y ) \\ ~~~=~~~~ \lt \mbox{ partition } y = \left( \begin{array}{c} y_0 \\ \hline \psi_1 \end{array} \right) \gt \\ \min_{y} f( \FlaOneByTwo{ P^{(k-1)} }{ p^{(k)} } \FlaTwoByOne{ y_0 }{ \psi_1 } ) \\ ~~~=~~~~ \lt \mbox{ instantiate } f \gt \\ \min_{y} \left[ \frac{1}{2} \left[ \FlaOneByTwo{ P^{(k-1)} }{ p^{(k)} } \FlaTwoByOne{ y_0 }{ \psi_1 } \right] ^T A \FlaOneByTwo{ P^{(k-1)} }{ p^{(k)} } \FlaTwoByOne{ y_0 }{ \psi_1 } \right. \\ ~~~~~~~~~~~~~~ \left. - \left[ \FlaOneByTwo{ P^{(k-1)} }{ p^{(k)} } \FlaTwoByOne{ y_0 }{ \psi_1 } \right]^T b \right]. \\ ~~~=~~~~ \lt \mbox{ multiply out } \gt \\ \min_{y} \left[ \frac{1}{2} \left[ y_0^T P^{(k-1)\,T} + \psi_1 p^{(k)\,T} \right] A \left[ P^{(k-1)} y_0 + \psi_1 p^{(k)} \right] - y_0^T P^{(k-1)\,T} b - \psi_1 p^{(k)\,T} b \right] \\ ~~~=~~~~ \lt \mbox{ multiply out some more } \gt \\ \min_{y} \left[ \frac{1}{2} y_0^T P^{(k-1)\,T} A P^{(k-1)} y_0 + \psi_1 y_0^T P^{(k-1)\,T} A p^{(k)} \right. \\ ~~~~~~~~ \left. + \frac{1}{2} \psi_1^2 p^{(k)\,T} A p^{(k)} - y_0^T P^{(k-1)\,^T} b - \psi_1 p^{(k)\,T} b \right] \\ ~~~=~~~~\lt \mbox{ rearrange } \gt \\ \min_{y} \left[ \frac{1}{2} y_0^T P^{(k-1)\,T} A P^{(k-1)} y_0 - y_0^T P^{(k-1)\,T} b + \psi_1 y_0^T P^{(k-1)\,^T} A p^{(k)} \right. \\ ~~~~~~~~ \left. + \frac{1}{2} \psi_1^2 p^{(k)\,T} A p^{(k)} - \psi_1 p^{(k)\,T} b \right]. \end{array} \end{equation*}

Now, if

\begin{equation*} P^{(k-1)\,T} A p^{(k)} = 0 \end{equation*}

then

\begin{equation*} \begin{array}{l} \min_{x \in \Span( p^{(0)}, \ldots , p^{(k-1)}, p^{(k)} )} f( x ) \\ ~~~=~~~~ \lt \mbox{ from before } \gt \\ \min_{y} \left[ \frac{1}{2} y_0^T P^{(k-1)\,T} A P^{(k-1)} y_0 - y_0^T P^{(k-1)\,^T} b \right. \\ ~~~~~~~~ + \begin{array}[t]{c} \underbrace{ \psi_1 y_0^T P^{(k-1)\,T} A p^{(k)} } \\ 0 \end{array} + \left. \frac{1}{2} \psi_1^2 p^{(k)\,T} A p^{(k)} - \psi_1 p^{(k)\,T} b \right] \\ ~~~=~~~~ \lt \mbox{ remove zero term } \gt \\ \min_{y} \left[ \frac{1}{2} y_0^T P^{(k-1)\,T} A P^{(k-1)} y_0 - y_0^T P^{(k-1)\,^T} b \right. \\ ~~~~~~~~ \left. \phantom{ + \psi_1 y_0^T P^{(k-1)\,T} A p^{(k)} } + \frac{1}{2} \psi_1^2 p^{(k)\,T} A p^{(k)} - \psi_1 p^{(k)\,T} b \right]\\ ~~~=~~~~ \lt \mbox{ split into two terms that can be minimized separately } \gt \\ \min_{y_0} \left[ \frac{1}{2} y_0^T P^{(k-1)\,T} A P^{(k-1)} y_0 - y_0^T P^{(k-1)\,^T} b \right] + \min_{\psi_1} \left[ \frac{1}{2} \psi_1^2 p^{(k)\,T} A p^{(k)} - \psi_1 p^{(k)\,T} b \right] \\ ~~~=~~~~ \lt \mbox{ recognize first set of terms as } f( P^{(k-1)} y_0 ) \gt \\ \min_{x \in \Span( p^{(0)}, \ldots , p^{(k-1)} )} f( x )+ \min_{\psi_1} \left[ \frac{1}{2} \psi_1^2 p^{(k)\,T} A p^{(k)} - \psi_1 p^{(k)\,T} b \right]. \end{array} \end{equation*}

The minimizing \(\psi_1 \) is given by

\begin{equation*} \psi_1 = \frac{p^{(k)\,T} b}{p^{(k)\,T} A p^{(k)} } . \end{equation*}

If we pick \(p^{(k)} = p^{(k)} \) and \(\alpha_k = \psi_1 \) then

\begin{equation*} x^{(k+1)} =P^{(k-1)} y_0 + \psi_1 p^{(k)} = \alpha_0 p^{(0)} + \cdots + \alpha_{k-1} p^{(k-1)} + \alpha_k p^{(k)} = x^{(k)} + \alpha_k p^{(k)}. \end{equation*}

A sequence of such directions is said to be A-conjugate.

Definition 8.3.1.2. A-conjugate directions.

Let \(A \) be SPD. A sequence \(p^{(0)}, \ldots, p^{(k-1)} \in \Rn \) such that \(p^{(j)\,T} A p^{(i)} = 0\) if and only if \(j \neq i \) is said to be A-conjugate.

Homework 8.3.1.2.

Let \(A \in \R^{n \times n}\) be SPD.

ALWAYS/SOMETIMES/NEVER: The columns of \(P \in \R^{n \times k}\) are A-conjugate if and only if \(P^T A P = D \) where \(D \) is diagonal and has positive values on its diagonal.

Answer

ALWAYS

Now prove it.

Solution
\begin{equation*} \begin{array}{l} P^T A P \\ ~~~ = ~~~~ \lt \mbox{ partition } P \mbox{ by columns }\gt \\ \left( \begin{array}{c | c | c} p_0 \amp \cdots \amp p_{k-1} \end{array} \right)^T A \left( \begin{array}{c | c | c} p_0 \amp \cdots \amp p_{k-1} \end{array} \right) \\ ~~~ = ~~~~ \lt \mbox{ transpose } \gt \\ \left( \begin{array}{c} p_0^T \\ \vdots \\ p_{k-1}^T \end{array} \right) A \left( \begin{array}{c | c | c} p_0 \amp \cdots \amp p_{k-1} \end{array} \right) \\ ~~~ = ~~~~ \lt \mbox{ multiply out } \gt \\ \left( \begin{array}{c} p_0^T \\ \vdots \\ p_{k-1}^T \end{array} \right) \left( \begin{array}{c | c | c} A p_0 \amp \cdots \amp A p_{k-1} \end{array} \right) \\ ~~~ = ~~~~ \lt \mbox{ multiply out } \gt \\ \left( \begin{array}{c | c | c | c } p_0^T A p_0 \amp p_0^T A p_1 \amp \cdots \amp p_0^T A p_{k-1} \\ \hline p_1^T A p_0 \amp p_1^T A p_1 \amp \cdots \amp p_1^T A p_{k-1} \\ \hline \vdots \amp \amp \vdots \\ \hline p_{k-1}^T A p_0 \amp p_{k-1}^T A p_1 \amp \cdots \amp p_{k-1}^T A p_{k-1} \end{array} \right) \\ ~~~ = ~~~~ \lt A = A^T \gt \\ \left( \begin{array}{c | c | c | c } p_0^T A p_0 \amp p_1^T A p_0 \amp \cdots \amp p_{k-1}^T A p_0 \\ \hline p_1^T A p_0 \amp p_1^T A p_1 \amp \cdots \amp p_{k-1}^T A p_1 \\ \hline \vdots \amp \amp \vdots \\ \hline p_{k-1}^T A p_0 \amp p_{k-1}^T A p_1 \amp \cdots \amp p_{k-1}^T A p_{k-1} \end{array} \right) \end{array} \end{equation*}

Now, if the columns of \(P \) are A-conjugate, then

\begin{equation*} \begin{array}{l} \left( \begin{array}{c | c | c | c } p_0^T A p_0 \amp p_1^T A p_0 \amp \cdots \amp p_{k-1}^T A p_0 \\ \hline p_1^T A p_0 \amp p_1^T A p_1 \amp \cdots \amp p_{k-1}^T A p_1 \\ \hline \vdots \amp \amp \vdots \\ \hline p_{k-1}^T A p_0 \amp p_{k-1}^T A p_1 \amp \cdots \amp p_{k-1}^T A p_{k-1} \end{array} \right) ~~~ = ~~~~ \lt \mbox{ multiply out } \gt \\ \left( \begin{array}{c | c | c | c} p_0^T A p_0 \amp 0 \amp \cdots \amp 0 \\ \hline 0 \amp p_1^T A p_1 \amp \cdots \amp 0 \\ \hline \vdots \amp \vdots \amp \ddots \amp \vdots \\ \hline 0 \amp 0 \amp \cdots \amp p_{k-1}^T A p_{k-1} \end{array} \right), \end{array} \end{equation*}

and hence \(P^T A P \) is diagonal.

If, on the other hand, \(P^T A P \) is diagonal, then the columns of \(P \) are A-conjugate.

Homework 8.3.1.3.

Let \(A \in \R^{n \times n}\) be SPD and the columns of \(P \in \R^{n \times k}\) be A-conjugate.

ALWAYS/SOMETIMES/NEVER: The columns of \(P \) are linearly independent.

Answer

ALWAYS

Now prove it!

Solution

We employ a proof by contradiction. Suppose the columns of \(P \) are not linearly independent. Then there exists \(y \neq 0 \) such that \(P y = 0 \text{.}\) Let \(D = P^T A P \text{.}\) From the last homework we know that \(D \) is diagonal and has positive diagonal elements. But then

\begin{equation*} \begin{array}{l} 0 \\ ~~~=~~~~ \lt P y = 0 \gt \\ ( P y )^T A ( P y ) \\ ~~~=~~~~ \lt \mbox{ multiply out } \gt \\ y^T P^T A P y \\ ~~~=~~~~ \lt P^T A P = D \gt \\ y^T D y \\ ~~~\gt ~~~~ \lt D \mbox{ is SPD} \gt \\ 0, \end{array} \end{equation*}

which is a contradiction. Hence, the columns of \(P\) are linearly independent.

The above observations leaves us with a descent method that picks the search directions to be A-conjugate, given in Figure 8.3.1.3.

\begin{equation*} \begin{array}{l} {\bf Given:~} A, b \\ x^{(0)} := 0 \\ r^{(0)} = b \\ k := 0 \\ {\bf while~} r^{(k)} \neq 0 \\ ~~~ \mbox{Choose } p^{(k)} \mbox{ such that } p^{(k)\,T} A P^{(k-1)} = 0 \mbox{ and } p^{(k)\,T} r^{(k)} \neq 0 \\ % ~~~ A p^{(k)} := A p^{(k)} \\ ~~~ \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} A p^{(k)}} \\ ~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)} \\ ~~~ r^{(k+1)} := r^{(k)} - \alpha_k A p^{(k)} \\ ~~~ k:= k+1 \\ {\bf endwhile} \end{array} \end{equation*}
Figure 8.3.1.3. Basic method that chooses the search directions to be A-conjugate.
Remark 8.3.1.4.

The important observation is that if \(p^{(0)}, \ldots , p^{(k)} \) are chosen to be A-conjugate, then \(x^{(k+1)} \) minimizes not only

\begin{equation*} f( x^{(k)} + \alpha p^{(k)} ) \end{equation*}

but also

\begin{equation*} \min_{x \in \Span( p^{(0)}, \ldots , p^{(k-1)} )} f( x ). \end{equation*}