Skip to main content

Unit 8.2.1 Basics of descent methods

Remark 8.2.1.1.

In the video, the quadratic polynomial pictured takes on the value \(- \frac{1}{2} \widehat x^T A \widehat x \) at \(\widehat x \) and that minimum is below the x-axis. This does not change the conclusions that are drawn in the video.

The basic idea behind a descent method is that at the \(k\)th iteration one has an approximation to \(x \text{,}\) \(x^{(k)} \text{,}\) and one would like to create a better approximation, \(x^{(k+1)} \text{.}\) To do so, the method picks a search direction, \(p^{(k)} \text{,}\) and chooses the next approximation by taking a step from the current approximate solution in the direction of \(p^{(k)} \text{:}\)

\begin{equation*} x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)}. \end{equation*}

In other words, one searches for a minimum along a line defined by the current iterate, \(x^{(k)} \text{,}\) and the search direction, \(p^{(k)} \text{.}\) One then picks \(\alpha_k\) so that, preferrably, \(f( x^{(k+1)} ) \leq f( x^{(k)} ) \text{.}\) This is summarized in Figure 8.2.1.2.

\begin{equation*} \begin{array}{l} {\bf Given:~} A, b, x^{(0)} \\ r^{(0)} := b - A x^{(0)} \\ k := 0 \\ {\bf while~} r^{(k)} \neq 0 \\ ~~~ p^{(k)} := \mbox{ next direction } \\ ~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)} \mbox{ for some scalar } \alpha_k \\ ~~~ r^{(k+1)} := b - A x^{(k+1)} \\ ~~~ k := k+1 \\ {\bf endwhile} \end{array} \end{equation*}
Figure 8.2.1.2. Outline for a descent method.
To this goal, typically, an exact descent method picks \(\alpha_k \) to exactly minimize the function along the line from the current approximate solution in the direction of \(p^{(k)} \text{.}\)

Now,

\begin{equation*} \begin{array}{l} f( x^{(k+1)} )\\ ~~~~=~~~ \lt x^{(k+1)} = x^{(k)} + \alpha_k p^{(k)} \gt \\ f( x^{(k)} + \alpha_k p^{(k)} )\\ ~~~~=~~~ \lt \mbox{ evaluate } \gt \\ \frac{1}{2} \left( x^{(k)} + \alpha_k p^{(k)} \right)^T A \left( x^{(k)} + \alpha_k p^{(k)} \right) - \left( x^{(k)} + \alpha_k p^{(k)} \right)^T b \\ ~~~~=~~~ \lt \mbox{ multiply out } \gt \\ \frac{1}{2} x^{(k)\,T} A x^{(k)} + \alpha_k p^{(k)\,T} A x^{(k)} + \frac{1}{2} \alpha_k^2 p^{(k)\,T} A p^{(k)} - x^{(k)\,T} b - \alpha_k p^{(k)\,T} b \\ ~~~~=~~~\lt \mbox{ rearrange } \gt \\ \frac{1}{2} x^{(k)\,T} A x^{(k)} - x^{(k)\,T} b + \frac{1}{2} \alpha_k^2 p^{(k)\,T} A p^{(k)} + \alpha_k p^{(k)\,T} A x^{(k)} - \alpha_k p^{(k)\,T} b \\ ~~~~=~~~ \lt \mbox{ substitute } f( x^{(k)} ) \mbox{ and factor out common terms } \gt \\ f( x^{(k)} ) + \frac{1}{2} \alpha_k^2 p^{(k)\,T} A p^{(k)} + \alpha_k p^{(k)\,T} ( A x^{(k)} - b ) \\ ~~~~=~~~ \lt \mbox{ substitute } r^{(k)} \mbox{ and commute to expose polynomial in } \alpha_k \\ \frac{1}{2} p^{(k)\,T} A p^{(k)} \alpha_k^2 - p^{(k)\,T} r^{(k)} \alpha_k + f( x^{(k)} ) , \end{array} \end{equation*}

where \(r^{(k)} = b - A x^{(k)}\) is the residual. This is a quadratic polynomial in the scalar \(\alpha_k \) (since this is the only free variable).

Minimizing

\begin{equation*} f( x^{(k+1)} ) = \frac{1}{2} p^{(k)\,T} A p^{(k)} \alpha_k^2 - p^{(k)\,T} r^{(k)} \alpha_k + f( x^{(k)} ) \end{equation*}

exactly requires the deriviative with respect to \(\alpha_k \) to be zero:

\begin{equation*} 0 = \frac{ df( x^{(k)} + \alpha_k p^{(k)} ) }{ d\alpha_{k} } = p^{(k)\,T} A p^{(k)} \alpha_k - p^{(k)\,T} r^{(k)}. \end{equation*}

Hence, for a given choice of \(p_k \)

\begin{equation*} \alpha_k = \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} A p^{(k)}} \quad \mbox{and} \quad x^{(k+1)} = x^{(k)} + \alpha_k p^{(k)}. \end{equation*}

provides the next approximation to the solution. This leaves us with the question of how to pick the search directions \(\{ p^{(0)}, p^{(1)}, \ldots \} \text{.}\)

A basic decent method based on these ideas is given in Figure 8.2.1.3.

\begin{equation*} \begin{array}{l} {\bf Given:~} A, b, x^{(0)} \\ r^{(0)} := b - A x^{(0)} \\ k := 0 \\ {\bf while~} r^{(k)} \neq 0 \\ ~~~ p^{(k)} := \mbox{ next direction } \\ ~~~ \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} A p^{(k)}} \\ ~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)} \\ ~~~ r^{(k+1)} := b - A x^{(k+1)} \\ ~~~k := k+1 \\ {\bf endwhile} \end{array} \end{equation*}
Figure 8.2.1.3. Basic descent method.

Homework 8.2.1.1.

The cost of an iterative method is a combination of how many iterations it takes to convergence and the cost per iteration. For the loop in Figure 8.2.1.3, count the number of matrix-vector multiplications, dot products, and "axpy" operations (not counting the cost of determining the next descent direction).

Solution
\begin{equation*} \begin{array}{l l} \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} A p^{(k)}} \amp 1 \mbox{ mvmult}, 2 \mbox{ dot products} \\ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)} \amp 1 \mbox{ axpy} \\ r^{(k+1)} := b - A x^{(k+1)} \amp 1 \mbox{ mvmult} \end{array} \end{equation*}

Total: 2 matrix-vector multiplies (mvmults), 2 dot products, 1 axpy.