Skip to main content

Subsection 8.1.1 Solving linear systems by solving a minimization problem

Consider the quadratic polynomial

\begin{equation*} f( \chi ) = \frac{1}{2} \alpha \chi^2 - \beta \chi. \end{equation*}

Finding the value \(\widehat \chi \) that minimizes this polynomial can be accomplished via the steps:

  • Compute the derivative and set it to zero:

    \begin{equation*} f'( \widehat \chi ) = \alpha \widehat \chi - \beta = 0. \end{equation*}

    We notice that computing \(\widehat \chi \) is equivalent to solving the linear system (of one equation)

    \begin{equation*} \alpha \widehat \chi = \beta. \end{equation*}
  • It is a minimum if \(\alpha \gt 0 \) (the quadratic polynomial is concaved up).

Obviously, you can turn this around: in order to solve \(\alpha \widehat \chi = \beta \) where \(\alpha \gt 0 \text{,}\) we can instead minimize the polynomial

\begin{equation*} f( \chi ) = \frac{1}{2} \alpha \chi^2 - \beta \chi. \end{equation*}

This course does not have multivariate calculus as a prerequisite, so we will walk you through the basic results we will employ. We will focus on finding a solution to \(A x = b \) where \(A \) is symmetric positive definite (SPD). (In our discussions we will just focus on real-valued problems). Now, if

\begin{equation*} f( x ) = \frac{1}{2} x^T A x - x^T b, \end{equation*}

then its gradient equals

\begin{equation*} \nabla f( x ) = A x - b . \end{equation*}

The function \(f( x ) \) is minimized (when \(A \) is SPD) when its gradient equals zero, which allows us to compute the vector for which the function achieves its minimum. The basic insight is that in order to solve \(A \widehat x = b \) we can instead find the vector \(\widehat x \) that minimizes the function \(f( x ) = \frac{1}{2} x^T A x - x^T b \text{.}\)

This proof does not employ multivariate calculus!

Let \(A \widehat x = b \text{.}\) Then

\begin{equation*} \begin{array}{l} f( x ) \\ ~~~ = ~~~~ \lt \mbox{ definition of } f( x ) \gt \\ \frac{1}{2} x^T A x - x^T b \\ ~~~=~~~~ \lt A \widehat x = b \gt \\ \frac{1}{2} x^T A x - x^T A \widehat x \\ ~~~=~~~~ \lt \mbox{ algebra } \gt \\ \frac{1}{2} x^T A x - x^T A \widehat x + \begin{array}[t]{c} \underbrace{ \frac{1}{2} \widehat x^T A \widehat x - \frac{1}{2} \widehat x^T A \widehat x} \\ 0 \end{array} \\ ~~~=~~~~ \lt \mbox{ factor out } \gt \\ \frac{1}{2} (x-\widehat x)^T A (x-\widehat x) - \frac{1}{2} \widehat x^T A \widehat x. \end{array} \end{equation*}

Since \(\widehat x^T A \widehat x \) is independent of \(x \text{,}\) and \(A\) is SPD, this is clearly minimized when \(x = \widehat x \text{.}\)