## Unit8.2.1Basics of descent methods

###### Remark8.2.1.1.

In the video, the quadratic polynomial pictured takes on the value $- \frac{1}{2} \widehat x^T A \widehat x$ at $\widehat x$ and that minimum is below the x-axis. This does not change the conclusions that are drawn in the video.

The basic idea behind a descent method is that at the $k$th iteration one has an approximation to $x \text{,}$ $x^{(k)} \text{,}$ and one would like to create a better approximation, $x^{(k+1)} \text{.}$ To do so, the method picks a search direction, $p^{(k)} \text{,}$ and chooses the next approximation by taking a step from the current approximate solution in the direction of $p^{(k)} \text{:}$

\begin{equation*} x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)}. \end{equation*}

In other words, one searches for a minimum along a line defined by the current iterate, $x^{(k)} \text{,}$ and the search direction, $p^{(k)} \text{.}$ One then picks $\alpha_k$ so that, preferrably, $f( x^{(k+1)} ) \leq f( x^{(k)} ) \text{.}$ This is summarized in Figure 8.2.1.2.

To this goal, typically, an exact descent method picks $\alpha_k$ to exactly minimize the function along the line from the current approximate solution in the direction of $p^{(k)} \text{.}$

Now,

\begin{equation*} \begin{array}{l} f( x^{(k+1)} )\\ ~~~~=~~~ \lt x^{(k+1)} = x^{(k)} + \alpha_k p^{(k)} \gt \\ f( x^{(k)} + \alpha_k p^{(k)} )\\ ~~~~=~~~ \lt \mbox{ evaluate } \gt \\ \frac{1}{2} \left( x^{(k)} + \alpha_k p^{(k)} \right)^T A \left( x^{(k)} + \alpha_k p^{(k)} \right) - \left( x^{(k)} + \alpha_k p^{(k)} \right)^T b \\ ~~~~=~~~ \lt \mbox{ multiply out } \gt \\ \frac{1}{2} x^{(k)\,T} A x^{(k)} + \alpha_k p^{(k)\,T} A x^{(k)} + \frac{1}{2} \alpha_k^2 p^{(k)\,T} A p^{(k)} - x^{(k)\,T} b - \alpha_k p^{(k)\,T} b \\ ~~~~=~~~\lt \mbox{ rearrange } \gt \\ \frac{1}{2} x^{(k)\,T} A x^{(k)} - x^{(k)\,T} b + \frac{1}{2} \alpha_k^2 p^{(k)\,T} A p^{(k)} + \alpha_k p^{(k)\,T} A x^{(k)} - \alpha_k p^{(k)\,T} b \\ ~~~~=~~~ \lt \mbox{ substitute } f( x^{(k)} ) \mbox{ and factor out common terms } \gt \\ f( x^{(k)} ) + \frac{1}{2} \alpha_k^2 p^{(k)\,T} A p^{(k)} + \alpha_k p^{(k)\,T} ( A x^{(k)} - b ) \\ ~~~~=~~~ \lt \mbox{ substitute } r^{(k)} \mbox{ and commute to expose polynomial in } \alpha_k \\ \frac{1}{2} p^{(k)\,T} A p^{(k)} \alpha_k^2 - p^{(k)\,T} r^{(k)} \alpha_k + f( x^{(k)} ) , \end{array} \end{equation*}

where $r^{(k)} = b - A x^{(k)}$ is the residual. This is a quadratic polynomial in the scalar $\alpha_k$ (since this is the only free variable).

Minimizing

\begin{equation*} f( x^{(k+1)} ) = \frac{1}{2} p^{(k)\,T} A p^{(k)} \alpha_k^2 - p^{(k)\,T} r^{(k)} \alpha_k + f( x^{(k)} ) \end{equation*}

exactly requires the deriviative with respect to $\alpha_k$ to be zero:

\begin{equation*} 0 = \frac{ df( x^{(k)} + \alpha_k p^{(k)} ) }{ d\alpha_{k} } = p^{(k)\,T} A p^{(k)} \alpha_k - p^{(k)\,T} r^{(k)}. \end{equation*}

Hence, for a given choice of $p_k$

\begin{equation*} \alpha_k = \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} A p^{(k)}} \quad \mbox{and} \quad x^{(k+1)} = x^{(k)} + \alpha_k p^{(k)}. \end{equation*}

provides the next approximation to the solution. This leaves us with the question of how to pick the search directions $\{ p^{(0)}, p^{(1)}, \ldots \} \text{.}$

A basic decent method based on these ideas is given in Figure 8.2.1.3.

###### Homework8.2.1.1.

The cost of an iterative method is a combination of how many iterations it takes to convergence and the cost per iteration. For the loop in Figure 8.2.1.3, count the number of matrix-vector multiplications, dot products, and "axpy" operations (not counting the cost of determining the next descent direction).

Solution
\begin{equation*} \begin{array}{l l} \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} A p^{(k)}} \amp 1 \mbox{ mvmult}, 2 \mbox{ dot products} \\ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)} \amp 1 \mbox{ axpy} \\ r^{(k+1)} := b - A x^{(k+1)} \amp 1 \mbox{ mvmult} \end{array} \end{equation*}

Total: 2 matrix-vector multiplies (mvmults), 2 dot products, 1 axpy.