ALAFF Toward practical descent methods

Subsection 8.2.2 Toward practical descent methods

Even though matrices are often highly sparse, a major part of the cost of solving \(A x = b \) via descent methods is in the matrix-vector multiplication (a cost that is proportional to the number of nonzeroes in the matrix). For this reason, reducing the number of these is an important part of the design of the algorithm.

Homework 8.2.2.1.

Let

\begin{equation*} \begin{array}{l} x^{(k+1)} = x^{(k)} + \alpha_k p^{(k)} \\ r^{(k)} = b - A x^{(k)} \\ r^{(k+1)} = b - A x^{(k+1)} \end{array} \end{equation*}

Show that

\begin{equation*} r^{(k+1)} = r^{(k)} - \alpha_k A p^{(k)}. \end{equation*}

Solution

\begin{equation*} \begin{array}{l} r^{(k+1)} = b - A x^{(k+1)} \\ ~~~=~~~ \lt r^{(k)} = b - A x^{(k)} \gt \\ r^{(k+1)} = r^{(k)} + A x^{(k)} - A x^{(k+1)} \\ ~~~=~~~ \lt \mbox{ rearrange, factor } \gt \\ r^{(k+1)} = r^{(k)} - A ( x^{(k+1)} - x^{(k)} ) \\ ~~~=~~~ \lt x^{(k+1)} = x^{(k)} + \alpha_k p^{(k)} \gt \\ r^{(k+1)} = r^{(k)} - \alpha_k A p^{(k)} \end{array} \end{equation*}

Alternatively:

\begin{equation*} \begin{array}{l} r^{(k+1)} = b - A x^{(k+1)} \\ ~~~=~~~ \lt x^{(k+1)} = x^{(k)} + \alpha_k p^{(k)} \gt \\ r^{(k+1)} = b - A (x^{(k)} + \alpha_k p^{(k)} ) \\ ~~~=~~~ \lt \mbox{ distribute } \gt \\ r^{(k+1)} = b - A x^{(k)} - \alpha_k A p^{(k)} \\ ~~~=~~~ \lt \mbox{ definition of } r^{(k)} \gt \\ r^{(k+1)} = r^{(k)} - \alpha_k A p^{(k)} \end{array} \end{equation*}

With the insights from this last homework, we can reformulate our basic descent method into one with only one matrix-vector multiplication, as illustrated in Figure 8.2.2.1.

\begin{equation*} \begin{array}{l} {\bf Given:~} A, b, x^{(0)} \\ r^{(0)} := b - A x^{(0)} \\ k := 0 \\ {\bf while~} r^{(k)} \neq 0 \\ ~~~ p^{(k)} = \mbox{ next direction } \\ ~~~ \phantom{q^{(k)} := A p^{(k)}} \\ ~~~ \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} A p^{(k)}} \\ ~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)} \\ ~~~ r^{(k+1)} := b - A x^{(k+1)} \\ ~~~k := k + 1 \\ {\bf endwhile} \end{array} \end{equation*}

\begin{equation*} \begin{array}{l} {\bf Given:~} A, b, x^{(0)} \\ r^{(0)} := b - A x^{(0)} \\ k := 0 \\ {\bf while~} r^{(k)} \neq 0 \\ ~~~ p^{(k)} := \mbox{ next direction } \\ ~~~ \phantom{q^{(k)} := A p^{(k)}} \\ ~~~ \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} A p^{(k)}} \\ ~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)} \\ ~~~ r^{(k+1)} := r^{(k)} - \alpha_k A p^{(k)} \\ ~~~ k := k + 1 \\ \\ {\bf endwhile} \end{array} \end{equation*}

\begin{equation*} \begin{array}{l} {\bf Given:~} A, b, x^{(0)} \\ r^{(0)} := b - A x^{(0)} \\ k := 0 \\ {\bf while~} r^{(k)} \neq 0 \\ ~~~ p^{(k)} := \mbox{ next direction } \\ ~~~ q^{(k)} := A p^{(k)} \\ ~~~ \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} q^{(k)}} \\ ~~~ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)} \\ ~~~ r^{(k+1)} := r^{(k)} - \alpha_k q^{(k)} \\ ~~~ k := k + 1 \\ {\bf endwhile} \end{array} \end{equation*}

Figure 8.2.2.1. Left: Basic descent method from last unit. Middle: Minor modification that recasts the computation of the residual \(r^{(k+1)} \) as an update of the previous residual \(r^{(k)} \text{.}\) Right: modification that reduces the number of matrix-vector multiplications by introducing temporary vector \(q^{(k)} \text{.}\)

Homework 8.2.2.2.

For loops in the algorithm in Figure 8.2.2.1 (Right), count the number of matrix-vector multiplications, dot products, and "axpy" operations (not counting the cost of determining the next descent direction).

Solution

\begin{equation*} \begin{array}{l l} q^{(k)} := A p^{(k)} \amp 1 \mbox{ mvmult} \\ \alpha_k := \frac{p^{(k)\,T} r^{(k)}}{p^{(k)\,T} qx^{(k)}} \amp 2 \mbox{ dot products} \\ x^{(k+1)} := x^{(k)} + \alpha_k p^{(k)} \amp 1 \mbox{ axpy} \\ r^{(k+1)} := r^{(k)} - \alpha_k q^{(k)} \amp 1 \mbox{axpy} \end{array} \end{equation*}

Total: 1 mvmults, 2 dot products, 2 axpys

We finish our discussion regarding basic descent methods by observing that we don't need to keep the history of vectors, \(x^{(k)} \text{,}\) \(p^{(k)} \text{,}\) \(r^{(k)} \text{,}\) \(q^{(k)} \text{,}\) and scalar \(\alpha_k\) that were computed as long as they are not needed to compute the next search direction, leaving us with the algorithm

\begin{equation*} \begin{array}{l} {\bf Given:~} A, b, x \\ r := b - A x \\ {\bf while~} r \neq 0 \\ ~~~ p := \mbox{ next direction } \\ ~~~ q := A p \\ ~~~ \alpha := \frac{p^T r}{p^T q} \\ ~~~ x := x + \alpha p \\ ~~~ r := r - \alpha q \\ {\bf endwhile} \end{array} \end{equation*}

Figure 8.2.2.2. The algorithm from Figure 8.2.2.1 (Right) storing only the most current vectors and scalar.