Question

Question 10.

Consider

\begin{equation*} A = \left( \begin{array}{r r} -1 \amp 0 \\ 2 \amp -1 \\ 2 \amp 1 \end{array} \right) \mbox{ and } b = \left( \begin{array}{r} 2 \\ 6 \\ 4 \end{array} \right) \end{equation*}

Give the best approximate solution (in the linear least squares sense) to

\begin{equation*} A x \approx b. \end{equation*}
What is the orthogonal projection of \(b \) onto the column space of \(A \text{?}\)
TRUE/FALSE: \(b \) in the column space of \(A \text{?}\)

Answer

\(\displaystyle \left(\begin{array}{r} 2\\ -1 \end{array}\right). \)
\(\displaystyle \left(\begin{array}{r} -2\\ 5 \\ 3 \end{array}\right).\)
FALSE

Why?

1 Solution

Solution

Given a matrix \(A \in \mathbb R^{m \times n} \) and vector \(b \in \mathbb R^m \text{,}\) the approximate solution, in the linear least squares sense, to \(A x = b \) equals a vector that minimizes

\begin{equation*} \min_{x \in \mathbb R^n} \| b - A x \|_2, \end{equation*}

the Euclidean length of the difference between \(b\) and \(A x \text{.}\) If the columns of \(A \) are linearly independent, the solution is unique. It is not hard to prove that the given \(A \) has linearly independent columns.

A straight forward way of computing this best approximate solution is to solve the normal equations: \(A^T A x = A^T b.\) Thus, we solve

\begin{equation*} \left( \begin{array}{r r} -1 \amp 0 \\ 2 \amp -1 \\ 2 \amp 1 \end{array} \right)^T \left( \begin{array}{r r} -1 \amp 0 \\ 2 \amp -1 \\ 2 \amp 1 \end{array} \right) \left( \begin{array}{c} \chi_0 \\ \chi_1 \end{array} \right) = \left( \begin{array}{r r} -1 \amp 0 \\ 2 \amp -1 \\ 2 \amp 1 \end{array} \right)^T \left( \begin{array}{r} 2 \\ 6 \\ 4 \end{array} \right), \end{equation*}

which simplifies to

\begin{equation*} \left( \begin{array}{r r} 9 \amp 0 \\ 0 \amp 2 \end{array} \right) \left( \begin{array}{c} \chi_0 \\ \chi_1 \end{array} \right) = \left(\begin{array}{r} 18\\ -2 \end{array}\right), \end{equation*}

(it is a happy coincidence that here \(A^T A \) is diagonal) and finally yields the solution

\begin{equation*} \left( \begin{array}{c} \chi_0 \\ \chi_1 \end{array} \right) = \left(\begin{array}{r} 2\\ -1 \end{array}\right). \end{equation*}

We notice that \(x = ( A^T A )A^T b \text{.}\) The matrix \(( A^T A)^{-1} A^T \) is known as the (left) pseudo-inverse of matrix \(A \text{.}\)
The orthogonal project of \(b \) onto the column space of \(A \) is the vector \(A x \) that minimizes

\begin{equation*} \min_{x \in \mathbb R^n} \| b - A x \|_2. \end{equation*}

Since we have computed the \(x \) that minimizes this in the last part of this question, the answer is

\begin{equation*} \begin{array}{l} A x \amp = \amp A \begin{array}[t]{c} \underbrace{( A^T A )^{-1} A^T b}\\ x \end{array} = \left( \begin{array}{r r} -1 \amp 0 \\ 2 \amp -1 \\ 2 \amp 1 \end{array} \right) \left( \begin{array}{c} \chi_0 \\ \chi_1 \end{array} \right) \\ \amp = \amp \left( \begin{array}{r r} -1 \amp 0 \\ 2 \amp -1 \\ 2 \amp 1 \end{array} \right) \left(\begin{array}{r} 2\\ -1 \end{array}\right) = \left(\begin{array}{r} -2\\ 5 \\ 3 \end{array}\right). \end{array} \end{equation*}
Since the orthogonal projection of \(b \) onto the column space of \(A \) is not equal to \(b\text{,}\) we conclude that \(b \) is not in the column space of \(A \text{.}\)

2 Relevance

Relevance

Solving linear least squares problems is of central importance to data sciences and scientific computing. It is synonymous with "linear regression."

In our example, the matrix happens to have orthogonal columns, and as a result \(A^T A \) is diagonal. More generally, when \(A \) has linearly independent columns, \(A^T A \) is symmetric positive definite (SPD). This allows the solution of \(A^T A x = A^T b\) to utilize a variant on LU factorization known as the Cholesky factorization that takes advantage of symmetry to reduce the cost of finding the solution. Reducing the cost (in time) is important since in data sciences and scientific computing one often wants to solve very large problems.

The method of normal equations is the simplest way for solving linear least squares problems. In the practical setting discussed in advanced linear algebra, one finds out that computer arithmetic, which invariably introduces round off error, can lead to an amplifacation of error when the columns of matrix \(A \) are almost linearly dependent.

Of great importance to linear algebra in general, and data science and scientific computing in particular, are the four fundamental spaces: the column, null, row, and left null spaces.

3 Resources

Resources

Advanced Linear Algebra: Are you ready?

Subsection 2.10 Question

Question 10.