Skip to main content

Subsection 4.2.4 Conditioning of the linear least squares problem

Given \(A \in \Cmxn \) with linearly independent columns and \(b \in \Cm \text{,}\) consider the linear least squares (LLS) problem

\begin{equation} \| b - A \widehat x \|_2 = \min_x \| b - A x \|_2 \label{eqn-LLS1}\tag{4.2.1} \end{equation}

and the perturbed problem

\begin{equation} \| (b + \delta\!b ) - A (\widehat x+\delta\! \widehat x ) \|_2 = \min_{x} \| ( b + \delta\!b ) - A (x+\delta\!x ) \|_2 \label{eqn-LLS2}\tag{4.2.2} \end{equation}

The question we want to examine is by how much the relative error in \(b \) is amplified into a relative error in \(\widehat x \text{.}\) We will restrict our discussion to the case where \(A \) has linearly independent columns.

Now, we discovered that \(\widehat b \text{,}\) the projection of \(b \) onto the column space of \(A \text{,}\) satisfies

\begin{equation} \widehat b = A \widehat x \label{eqn-LLS-cond1}\tag{4.2.3} \end{equation}

and the projection of \(b + \delta\! b \) satisfies

\begin{equation} \widehat b + \delta\! \widehat b = A ( \widehat x + \delta\! \widehat x ) \label{eqn-LLS-cond2}\tag{4.2.4} \end{equation}

where \(\delta\! \widehat b \) equals the projection of \(\delta\!b \) onto the column space of \(A \text{.}\)

Let \(\theta \) equal the angle between vectors \(b \) and its projection \(\widehat b \) (which equals the angle between \(b \) and the column space of \(A \)). Then

\begin{equation*} \cos(\theta) = \| \widehat b \|_2 / \| b \|_2 \end{equation*}

and hence

\begin{equation*} \cos(\theta) \| b \|_2 = \| \widehat b \|_2 = \| A \widehat x \|_2 \leq \| A \|_2 \| \widehat x \|_2 = \sigma_0 \| \widehat x \|_2 \end{equation*}

which (as long as \(\widehat x \neq 0 \)) can be rewritten as

\begin{equation} \frac{1}{\| \widehat x \|_2} \leq \frac{\sigma_0}{\cos( \theta )} \frac{1}{\| b \|_2}. \label{eqn-LLS-cond3}\tag{4.2.5} \end{equation}

Subtracting (4.2.3) from (4.2.4) yields

\begin{equation*} \delta\! \widehat b = A \delta\! \widehat x \end{equation*}

or, equivalently,

\begin{equation*} A \delta\! \widehat x = \delta\! \widehat b \end{equation*}

which is solved by

\begin{equation*} \begin{array}{rcl} \delta\! \widehat x \amp = \amp A^\dagger \delta\! \widehat b \\ \amp = \amp A^\dagger A ( A^H A )^{-1} A^H \delta b \\ \amp = \amp ( A^H A )^{-1} A^H A ( A^H A )^{-1} A^H \delta b \\ \amp = \amp A^\dagger \delta\! b , \end{array} \end{equation*}

where \(A^\dagger = ( A^H A )^{-1} A^H \) is the pseudo inverse of \(A \) and we recall that \(\delta\! \widehat b = A ( A^H A )^{-1} A^H \delta b \text{.}\) Hence

\begin{equation} \| \delta\! \widehat x \|_2 \leq \| A^\dagger \|_2 \| \delta\! b \|_2.\label{eqn-LLS-cond4}\tag{4.2.6} \end{equation}
Homework 4.2.4.1.

Let \(A \in \Cmxn \) have linearly independent columns. Show that

\begin{equation*} \| ( A^H A )^{-1} A^H \|_2 = 1/\sigma_{n-1}, \end{equation*}

where \(\sigma_{n-1} \) equals the smallest singular value of \(A \text{.}\)

Hint

Use the reduced SVD of \(A \text{.}\)

Solution

Let \(A = U_L \Sigma_{TL} V^H\) be the reduced SVD of \(A \text{,}\) where \(V \) is square because \(A \) has linearly independent columns. Then

\begin{equation*} \begin{array}{l} \| ( A^H A )^{-1} A^H \|_2 \\ ~~~=~~~~\\ \| ( ( U_L \Sigma_{TL} V^H )^H U_L \Sigma_{TL} V^H )^{-1} ( U_L \Sigma_{TL} V^H )^H \|_2 \\ ~~~=~~~~\\ \| ( V \Sigma_{TL} U_L^H U_L \Sigma_{TL} V^H )^{-1} V \Sigma_{TL} U_L^H \|_2 \\ ~~~=~~~~\\ \| ( V \Sigma_{TL}^{-1} \Sigma_{TL}^{-1} V^H ) V \Sigma_{TL} U_L^H \|_2 \\ ~~~=~~~~\\ \| V \Sigma_{TL}^{-1} U_L^H \|_2 \\ ~~~=~~~~\\ \| \Sigma_{TL}^{-1} U_L^H \|_2 \\ ~~~=~~~\\ 1/\sigma_{n-1}. \end{array} \end{equation*}

This last step needs some more explanation: Clearly \(\| \Sigma_{TL} U_L^H \|_2 \leq \| \Sigma_{TL} \|_2 \| U_L^H \|_2 = \sigma_{0} \| U_L^H \|_2 \leq \sigma_0 \text{.}\) We need to show that there exists a vector \(x \) with \(\| x \|_2 = 1 \) such that \(\| \Sigma_{TL} U_L^H x \|_2 = \| \Sigma_{TL} U_L^H \|_2 \text{.}\) If we pick \(x = u_0 \) (the first column of \(U_L\)), then \(\| \Sigma_{TL} U_L^H x \|_2 = \| \Sigma_{TL} U_L^H u_0 \|_2 = \| \Sigma_{TL} e_0 \|_2 = \| \sigma_0 e_0 \|_2 = \sigma_0 \text{.}\)

Combining (4.2.5), (4.2.6), and the result in this last homework yields

\begin{equation} \frac{\| \delta\! \widehat x\|_2}{\| \widehat x \|_2} \leq \frac{1}{\cos( \theta )} \frac{\sigma_0}{\sigma_{n-1}} \frac{\| \delta\! b \|_2}{\| b \|_2}. \label{eqn-LLS-cond5}\tag{4.2.7} \end{equation}

Notice the effect of the \(\cos(\theta)b \text{.}\) If \(b\) is almost perpendicular to \({\cal C}(A) \text{,}\) then its projection \(\widehat b \) is small and \(\cos \theta \) is small. Hence a small relative change in \(b \) can be greatly amplified. This makes sense: if \(b \) is almost perpendical to \(\Col( A ) \text{,}\) then \(\widehat x \approx 0 \text{,}\) and any small \(\delta\!b \in \Col(A)\) can yield a relatively large change \(\delta\!x \text{.}\)

Definition 4.2.4.1. Condition number of matrix with linearly independent columns.

Let \(A \in \Cmxn \) have linearly independent columns (and hence \(n \leq m \)). Then its condition number (with respect to the 2-norm) is defined by

\begin{equation*} \kappa_2( A ) = \| A \|_2 \| A^\dagger \|_2 = \frac{\sigma_0}{\sigma_{n-1}}. \end{equation*}

It is informative to explicity expose \(\cos( \theta ) = \| \widehat b \|_2/ \| b \|_2 \) in (4.2.7):

\begin{equation*} \frac{\| \delta\! \widehat x\|_2}{\| \widehat x \|_2} \leq \frac{\| b \|_2}{\| \widehat b \|_2} \frac{\sigma_0}{\sigma_{n-1}} \frac{\| \delta\! b \|_2}{\| b \|_2}. \end{equation*}

Notice that the ratio

\begin{equation*} \frac{\| \delta\! b \|_2}{\| b \|_2} \end{equation*}

can be made smaller by adding a component, \(b_r \text{,}\) to \(b \) that is orthogonal to \(\Col( A ) \) (and hence does not change the projection onto the column space, \(\widehat b \)):

\begin{equation*} \frac{\| \delta\! b \|_2}{\| b + b_r \|_2}. \end{equation*}

The factor \(1/\cos( \theta ) \) ensures that this does not magically reduce the relative error in \(\widehat x \text{:}\)

\begin{equation*} \frac{\| \delta\! \widehat x\|_2}{\| \widehat x \|_2} \leq \frac{\| b + b_r \|_2}{\| \widehat b \|_2} \frac{\sigma_0}{\sigma_{n-1}} \frac{\| \delta\! b \|_2}{\| b + b_r \|_2}. \end{equation*}