## Unit8.2.2Toward practical descent methods

Even though matrices are often highly sparse, a major part of the cost of solving $A x = b$ via descent methods is in the matrix-vector multiplication (a cost that is proportional to the number of nonzeroes in the matrix). For this reason, reducing the number of these is an important part of the design of the algorithm.

###### Homework8.2.2.1.

Let

\begin{equation*} \begin{array}{l} x^{(k+1)} = x^{(k)} + \alpha_k p^{(k)} \\ r^{(k)} = b - A x^{(k)} \\ r^{(k+1)} = b - A x^{(k+1)} \end{array} \end{equation*}

Show that

\begin{equation*} r^{(k+1)} = r^{(k)} - \alpha_k A p^{(k)}. \end{equation*}
Solution
\begin{equation*} \begin{array}{l} r^{(k+1)} = b - A x^{(k+1)} \\ ~~~=~~~ \lt r^{(k)} = b - A x^{(k)} \gt \\ r^{(k+1)} = r^{(k)} + A x^{(k)} - A x^{(k+1)} \\ ~~~=~~~ \lt \mbox{ rearrange, factor } \gt \\ r^{(k+1)} = r^{(k)} - A ( x^{(k+1)} - x^{(k)} ) \\ ~~~=~~~ \lt x^{(k+1)} = x^{(k)} + \alpha_k p^{(k)} \gt \\ r^{(k+1)} = r^{(k)} - \alpha_k A p^{(k)} \end{array} \end{equation*}

Alternatively:

\begin{equation*} \begin{array}{l} r^{(k+1)} = b - A x^{(k+1)} \\ ~~~=~~~ \lt x^{(k+1)} = x^{(k)} + \alpha_k p^{(k)} \gt \\ r^{(k+1)} = b - A (x^{(k)} + \alpha_k p^{(k)} ) \\ ~~~=~~~ \lt \mbox{ distribute } \gt \\ r^{(k+1)} = b - A x^{(k)} - \alpha_k A p^{(k)} \\ ~~~=~~~ \lt \mbox{ definition of } r^{(k)} \gt \\ r^{(k+1)} = r^{(k)} - \alpha_k A p^{(k)} \end{array} \end{equation*}

With the insights from this last homework, we can reformulate our basic descent method into one with only one matrix-vector multiplication, as illustrated in Figure 8.2.2.1.

###### Homework8.2.2.2.

For loops in the algorithm in Figure 8.2.2.1 (Right), count the number of matrix-vector multiplications, dot products, and "axpy" operations (not counting the cost of determining the next descent direction).

Solution
\begin{equation*} \begin{array}{l l} q^{(k)} := A p^{(k)} \amp 1 \mbox{ mvmult} \\ \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} qx^{(k)}} \amp 2 \mbox{ dot products} \\ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)} \amp 1 \mbox{ axpy} \\ r^{(k+1)} := r^{(k)} - \alpha_k q^{(k)} \amp 1 \mbox{axpy} \end{array} \end{equation*}

Total: 1 mvmults, 2 dot products, 2 axpys

We finish our discussion regarding basic descent methods by observing that we don't need to keep the history of vectors, $x^{(k)} \text{,}$ $p^{(k)} \text{,}$ $r^{(k)} \text{,}$ $q^{(k)} \text{,}$ and scalar $\alpha_k$ that were computed as long as they are not needed to compute the next search direction, leaving us with the algorithm