Skip to main content

Subsection 2.3.1 The Singular Value Decomposition Theorem

The following is probably the most important result in linear algebra:

Before proving this theorem, we are going to put some intermediate results in place.

Remark 2.3.1.2.

As the course progresses, we will notice that there is a conflict between the notation that explicitly exposes indices, e.g.,

\begin{equation*} U = \left( \begin{array}{c c c c} u_0 \amp u_1 \amp \cdots \amp u_{n-1} \end{array} \right) \end{equation*}

and the notation we use to hide such explicit indexing, which we call the FLAME notation, e.g.,

\begin{equation*} U = \FlaOneByThreeR{ U_0 }{ u_1 }{ U_2 }. \end{equation*}

The two linked by

\begin{equation*} \left( \begin{array}{c | c c } \begin{array}[t]{c} \underbrace{ \begin{array}{ c c} u_0 \amp u_{k-1} \end{array} } \\ U_0 \end{array} \amp \begin{array}[t]{c} \underbrace{u_k}\\ u_1 \end{array} \amp \begin{array}[t]{c} \underbrace{ \begin{array}{ c c} u_{k+1} \amp u_{n-1} \end{array} } \\ U_2 \end{array} \end{array} \right). \end{equation*}

In algorithms that use explicit indexing, \(k \) often is the loop index that identifies where in the matrix or vector the algorithm currently has reached. In the FLAME notation, the index \(1 \) identifies that place. This creates a conflict for the two distinct items that are both indexed with \(1 \text{,}\) e.g., \(u_1 \) in our example here. It is our experience that learners quickly adapt to this and hence have not tried to introduce even more notation that avoids this conflict. In other words: you will almost always be able to tell from context what is meant. The following lemma and its proof illustrate this further.

In the below proof, it is really imporant to keep track of when a line is part of the partitioning of a matrix or vector, and when it denotes scalar division.

Choose \(\sigma_1 \) and \(\widetilde v_1 \in \C^n \) such that

  • \(\| \widetilde v_1 \|_2 = 1 \text{;}\) and

  • \(\sigma_1 = \| A \widetilde v_1 \|_2 = \| A \|_2 \text{.}\)

In other words, \(\widetilde v_1 \) is the vector that maximizes \(\max_{\| x \|_2 = 1} \| A x \|_2 \text{.}\)

Let \(\widetilde u_1 = A \widetilde v_1 / \sigma_1 \text{.}\) Then

\begin{equation*} \| \widetilde u_1 \|_2 = \| A \widetilde v_1 \|_2 / \sigma_1 = \| A \widetilde v_1 \|_2 / \| A \|_2 = \| A \|_2 / \| A \|_2 = 1. \end{equation*}

Choose \(\widetilde U_2 \in \C^{m \times (m-1)} \) and \(\widetilde V_2 \in \C^{n \times (n-1)} \) so that

\begin{equation*} \widetilde{U} = \left( \begin{array}{c | c} \widetilde u_1 \amp \widetilde U_2 \end{array} \right) \mbox{ and } \widetilde{V} = \left( \begin{array}{c | c} \widetilde v_1 \amp \widetilde V_2 \end{array} \right) \end{equation*}

are unitary. Then

\begin{equation*} \begin{array}{l} \widetilde U^H A \widetilde V \\ ~~~=~~~~ \lt \mbox{ instantiate } \gt \\ \left( \begin{array}{c | c} \widetilde u_1 \amp \widetilde U_2 \end{array} \right)^H A \left( \begin{array}{c | c} \widetilde v_1 \amp \widetilde V_2 \end{array} \right) \\ ~~~=~~~~ \lt \mbox{ multiply out } \gt \\ \left( \begin{array}{c | c} \widetilde u_1^H A \widetilde v_1 \amp \widetilde u_1^H A \widetilde V_2 \\ \hline \widetilde U_2^H A \widetilde v_1 \amp \widetilde U_2^H A \widetilde V_2 \end{array} \right) \\ ~~~=~~~~ \lt A \widetilde v_1 = \sigma_1 \widetilde u_1\gt \\ \left( \begin{array}{c | c} \sigma_1 \widetilde u_1^H \widetilde u_1 \amp \widetilde u_1^H A \widetilde V_2 \\ \hline \sigma_1 \widetilde U_2^H \widetilde u_1 \amp \widetilde U_2^H A \widetilde V_2 \end{array} \right) \\ ~~~=~~~~ \lt \widetilde u_1^H \widetilde u_1 = 1; \widetilde U_2^H \widetilde u_1 = 0 ; \mbox{ pick } w = \widetilde V_2^H A^H \widetilde u_1 \mbox{ and } B = \widetilde U_2^H A \widetilde V_2 \gt \\ \left( \begin{array}{c | c} \sigma_1 \amp w^H \\ \hline 0 \amp B \end{array} \right), \end{array} \end{equation*}

where \(w = \widetilde V_2^H A^H \widetilde u_1 \) and \(B = \widetilde U_2^H A \widetilde V_2 \text{.}\)

We will now argue that \(w = 0 \text{,}\) the zero vector of appropriate size:

\begin{equation*} \begin{array}{l} \sigma_1^2 \\ ~~~=~~~~\lt \mbox{assumption} \gt \\ \| A \|_2^2\\ ~~~=~~~~\lt \mbox{ 2-norm is invariant under multiplication by unitary matrix } \gt \\ \| \widetilde U^H A \widetilde V \|_2^2 \\ ~~~=~~~~\lt \mbox{ definition of } \| \cdot \|_2 \gt \\ \max_{x \neq 0} \frac{\| \widetilde U^H A \widetilde V x \|_2^2}{\| x \|_2^2} \\ ~~~=~~~~\lt \mbox{ see above } \gt \\ \max_{x \neq 0} \frac{ \left\| \left( \begin{array}{c | c} \sigma_1 \amp w^H \\ \hline 0 \amp B \end{array} \right) x \right\|_2^2} {\| x \|_2^2} \\ ~~~\ge~~~~\lt x \mbox{ replaced by specific vector } \gt \\ \frac{\left\| \left( \begin{array}{c | c} \sigma_1 \amp w^H \\ \hline 0 \amp B \end{array} \right) \left( \begin{array}{c} \sigma_1 \\ \hline w \end{array} \right) \right\|_2^2}{\left\| \left( \begin{array}{c} \sigma_1 \\ \hline w \end{array} \right) \right\|_2^2} \\ ~~~=~~~~\lt \mbox{ multiply out numerator } \gt \\ \frac{\left\| \left( \begin{array}{c } \sigma_1^2 + w^H w \\ \hline B w \end{array} \right) \right\|_2^2}{\left\| \left( \begin{array}{c} \sigma_1 \\ \hline w \end{array} \right) \right\|_2^2} \\ ~~~\geq~~~~\lt \left\| \left( \begin{array}{c} \psi_1 \\ \hline y_2 \end{array} \right) \right\|_2^2 = \| \psi_1 \|_2^2 + \| y_2 \|_2^2 \geq \| \psi_1 \|_2^2; \left\| \left( \begin{array}{c} \sigma_1 \\ \hline w \end{array} \right) \right\|_2^2 = \sigma_1^2 + w^H w \gt \\ { ( \sigma_1^2 + w^H w )^2 }/{ (\sigma_1^2 + w^H w) } \\ ~~~=~~~~ \lt \mbox{ algebra } \gt \\ \sigma_1^2 + w^H w. \end{array} \end{equation*}

Thus \(\sigma_1^2 \geq \sigma_1^2 + w^H w \) which means that \(w = 0 \) (the zero vector) and \(\widetilde U^H A \widetilde V = \left( \begin{array}{c | c} \sigma_1 \amp 0 \\ \hline 0 \amp B \end{array} \right)\) so that \(A = \widetilde U \left( \begin{array}{c | c} \sigma_1 \amp 0 \\ \hline 0 \amp B \end{array} \right) \widetilde V^H \text{.}\)

Hopefully you can see where this is going: If one can recursively find that \(B = U_B \Sigma_B V_B^H \text{,}\) then

\begin{equation*} \begin{array}{rcl} A \amp = \amp \widetilde U \left( \begin{array}{c | c} \sigma_1 \amp 0 \\ \hline 0 \amp B \end{array} \right) \widetilde V^H \\ \amp = \amp \widetilde U \left( \begin{array}{c | c} \sigma_1 \amp 0 \\ \hline 0 \amp U_B \Sigma_B V_B^H \end{array} \right) \widetilde V^H \\ \amp = \amp \widetilde U \left( \begin{array}{c | c} 1 \amp 0 \\ \hline 0 \amp U_B \end{array} \right) \left( \begin{array}{c | c} \sigma_1 \amp 0 \\ \hline 0 \amp \Sigma_B \end{array} \right) \left( \begin{array}{c | c} 1 \amp 0 \\ \hline 0 \amp V_B^H \end{array} \right) \widetilde V^H \\ \amp = \amp \begin{array}[t]{c} \underbrace{ \widetilde U \left( \begin{array}{c | c} 1 \amp 0 \\ \hline 0 \amp U_B \end{array} \right) } \\ U \end{array} \begin{array}[t]{c} \underbrace{ \left( \begin{array}{c | c} \sigma_1 \amp 0 \\ \hline 0 \amp \Sigma_B \end{array} \right)} \\ \Sigma \end{array} \begin{array}[t]{c} \underbrace{ \left( \widetilde V \left( \begin{array}{c | c} 1 \amp 0 \\ \hline 0 \amp V_B \end{array} \right) \right)^H. } \\ V^H \end{array} \end{array}\text{.} \end{equation*}

The next exercise provides the insight that the values on the diagonal of \(\Sigma \) will be ordered from largest to smallest.

Homework 2.3.1.1.

Let \(A \in \C^{m \times n} \) with \(A = \left( \begin{array}{ c | c} \sigma_1 \amp 0 \\ \hline 0 \amp B \end{array} \right) \) and assume that \(\| A \|_2 = \sigma_1 \text{.}\)

ALWAYS/SOMETIMES/NEVER: \(\| B \|_2 \leq \sigma_1 \text{.}\)

Solution

We will employ a proof by contradiction. Assume that \(\| B \|_2 > \sigma_1 \text{.}\) Then there exists a vector \(z \) with \(\| z \|_2 = 1 \) such that \(\| B \|_2 = \| B z \|_2 = \max_{\| x \|_2 = 1} \| B x \|_2 \text{.}\) But then

\begin{equation*} \begin{array}{l} \| A \|_2 \\ ~~~ = ~~~~ \lt {\rm definition} \gt \\ \max_{\| x \|_2 = 1} \| A x \|_2 \\ ~~~\geq~~~~ \lt {\rm pick~a~specific~vector~with~2-norm~equal~to~one} \gt \\ \left\| A \left( \begin{array}{c} 0 \\ \hline z \end{array} \right) \right\|_2 \\ ~~~=~~~~ \lt \mbox{ instantiate } A \gt \\ \left\| \left( \begin{array}{ c | c} \sigma_1 \amp 0 \\ \hline 0 \amp B \end{array} \right) \left( \begin{array}{c} 0 \\ \hline z \end{array} \right) \right\|_2 \\ ~~~=~~~~ \lt \mbox{ partitioned matrix-vector multiplication } \gt \\ \left\| \left( \begin{array}{c} 0 \\ \hline B z \end{array} \right) \right\|_2\\ ~~~=~~~~ \lt \left\| \left( \begin{array}{c} y_0 \\ y_1 \end{array}\right) \right\|_2^2 = \| y_0 \|_2^2 + \| y_1 \|_2^2 \gt \\ \| B z \|_2 \\ ~~~=~~~~ \lt \mbox{ assumption about } z \gt \\ \| B \|_2 \\ ~~~\gt ~~~~ \lt \mbox{ assumption } \gt \\ \sigma_1. \end{array} \end{equation*}

which is a contradiction.

Hence \(\| B \|_2 \leq \sigma_1 \text{.}\)

We are now ready to prove the Singular Value Decomposition Theorem.

We will prove this for \(m \geq n \text{,}\) leaving the case where \(m \leq n \) as an exercise.

Proof by induction: Since \(m \geq n \text{,}\) we select \(m \) to be arbritary and induct on \(n \text{.}\)

  • Base case: \(n = 1 \text{.}\)

    In this case \(A = \left( \begin{array}{c} a_1 \end{array} \right) \) where \(a_1 \in \Cm \) is its only column.

    Case 1: \(a_1 = 0 \) (the zero vector).

    Then

    \begin{equation*} A = \left( \begin{array}{c} 0 \end{array} \right) = \begin{array}[t]{c} \underbrace{ I_{m \times m} } \\ U \end{array} \left( \begin{array}{ c | c } ~ \amp ~ \\ \hline ~ \amp 0 \end{array} \right) \begin{array}[t]{c} \underbrace{ I_{1 \times 1} } \\ V^H \end{array} \end{equation*}

    so that \(U = I_{m \times m} \text{,}\) \(V = I_{1 \times 1} \text{,}\) and \(\Sigma_{TL} \) is an empty matrix.

    Case 2: \(a_1 \neq 0 \text{.}\)

    Then

    \begin{equation*} A = \left( \begin{array}{c} a_1 \end{array} \right) = \left( \begin{array}{c} u_1 \end{array} \right) \left( \| a_1 \|_2 \right) \end{equation*}

    where \(u_1 = a_1 / \| a_1 \|_2 \text{.}\) Choose \(U_2 \in \C^{m \times (m-1)} \) so that \(U = \left( \begin{array}{c | c} u_1 \amp U_2 \end{array} \right)\) is unitary. Then

    \begin{equation*} \begin{array}{rcl} A \amp = \amp \left( \begin{array}{c} a_1 \end{array} \right) \\ \amp = \amp \left( \begin{array}{c} u_1 \end{array} \right) ( \| a_1 \|_2 ) \\ \amp = \amp \left( \begin{array}{c| c} u_1 \amp U_2 \end{array} \right) \left( \begin{array}{c | c } \| a_1 \|_2 \amp ~\\ \hline 0 \amp ~ \end{array} \right) \left( \begin{array}{c} 1 \end{array} \right)^H \\ \amp = \amp U \Sigma V^H, \end{array} \end{equation*}

    where

    • \(U = \left( \begin{array}{c| c} u_0 \amp U_1 \end{array} \right) \text{,}\)

    • \(\Sigma = \left( \begin{array}{ c | c} \Sigma_{TL} \amp ~ \\ \hline 0 \amp ~ \end{array} \right) \) with \(\Sigma_{TL} = \left( \begin{array}{c} \sigma_1 \end{array} \right) \) and \(\sigma_1 = \| a_1 \|_2 = \| A \|_2 \)

    • \(V = \left( \begin{array}{c} 1 \end{array} \right) \text{.}\)

  • Inductive step:

    Assume the result is true for matrices with \(1 \leq k \) columns. Show that it is true for matrices with \(k+1 \) columns.

    Let \(A \in \C^{m \times (k+1)} \) with \(1 \leq k \lt n \text{.}\)

    Case 1: \(A = 0 \) (the zero matrix)

    Then

    \begin{equation*} A = I_{m \times m} \left( \begin{array}{c |c } ~ \amp ~ \\ \hline 0_{m \times (k+1)} \amp ~ \end{array} \right) I_{(k+1) \times (k+1)} \end{equation*}

    so that \(U = I_{m \times m} \text{,}\) \(V = I_{(k+1) \times (k+1)} \text{,}\) and \(\Sigma_{TL} \) is an empty matrix.

    Case 2: \(A \neq 0 \text{.}\)

    Then \(\| A \|_2 \neq 0 \text{.}\) By Lemma 2.3.1.3, we know that there exist unitary \(\widetilde U \in \C^{m \times m} \) and \(\widetilde V \in \C^{(k+1) \times (k+1)} \) such that \(A = \widetilde U \left( \begin{array}{c|c} \sigma_1 \amp 0 \\ \hline 0 \amp B \end{array} \right)\widetilde V \) with \(\sigma_1 = \| A \|_2 \text{.}\)

    By the inductive hypothesis, there exist unitary \(\check U_B \in \C^{ (m-1) \times (m-1)} \text{,}\) unitary \(\check V_B \in \C^{k \times k} \text{,}\) and \(\check \Sigma_B \in \R^{(m-1) \times k}\) such that \(B = \check U_B \check \Sigma_B \check V_B^H \) where \(\check \Sigma_B = \FlaTwoByTwo{ \check \Sigma_{TL} }{ 0 }{ 0 }{ 0 } \text{,}\) \(\check \Sigma_{TL} = \diag{ \sigma_2, \cdots , \sigma_{r-1} } \text{,}\) and \(\sigma_2 \geq \cdots \geq \sigma_{r-1} \gt 0 \text{.}\)

    Now, let

    \begin{equation*} U = \widetilde U \left( \begin{array}{c | c} 1 \amp 0 \\ \hline 0 \amp \check U_B \end{array} \right), V = \widetilde V \left( \begin{array}{c | c} 1 \amp 0 \\ \hline 0 \amp \check V_B \end{array} \right), \mbox{ and } \Sigma = \left( \begin{array}{c | c} \sigma_1 \amp 0 \\ \hline 0 \amp \check \Sigma_B \end{array} \right). \end{equation*}

    (There are some really tough to see "checks" in the definition of \(U \text{,}\) \(V \text{,}\) and \(\Sigma \text{!!}\)) Then \(A = U \Sigma V^H \) where \(U \text{,}\) \(V \text{,}\) and \(\Sigma\) have the desired properties. Key here is that \(\sigma_1 = \| A \|_2 \geq \| B \|_2 \) which means that \(\sigma_1 \geq \sigma_2 \text{.}\)

  • By the Principle of Mathematical Induction the result holds for all matrices \(A \in \C^{m \times n} \) with \(m \geq n \text{.}\)

Homework 2.3.1.2.

Let \(\Sigma = \diag{\sigma_0, \ldots, \sigma_{n-1}} \text{.}\) ALWAYS/SOMETIMES/NEVER: \(\| \Sigma \|_2 = \max_{i=0}^{n-1} \vert \sigma_i \vert \text{.}\)

Answer

ALWAYS

Now prove it.

Solution

Yes, you have seen this before, in Homework 1.3.5.1. We repeat it here because of its importance to this topic.

\begin{equation*} \begin{array}{rcl} \| \Sigma \|_2^2 \amp=\amp \max_{\| x \|_2 = 1} \| \Sigma x \|_2^2 \\ \amp = \amp \max_{\| x \|_2 = 1} \left\| \left( \begin{array}{c c c c} \sigma_0 \amp 0 \amp \cdots \amp 0 \\ 0 \amp \sigma_1 \amp \cdots \amp 0 \\ \vdots \amp \vdots \amp \ddots \amp \vdots \\ 0 \amp 0 \amp \cdots \amp \sigma_{n-1} \end{array} \right) \left( \begin{array}{c} \chi_0 \\ \chi_1 \\ \vdots \\ \chi_{n-1} \end{array} \right) \right\|_2^2 \\ \amp=\amp \max_{\| x \|_2 = 1} \left\| \left( \begin{array}{c} \sigma_0 \chi_0 \\ \sigma_1 \chi_1 \\ \vdots \\ \sigma_{n-1} \chi_{n-1} \end{array} \right) \right\|_2^2 \\ \amp = \amp \max_{\| x \|_2 = 1} \left[ \sum_{j=0}^{n-1} \vert \sigma_j \chi_j \vert^2 \right] \\ \amp = \amp \max_{\| x \|_2 = 1} \left[ \sum_{j=0}^{n-1} \left[ \vert \sigma_j \vert^2 \vert \chi_j \vert^2 \right] \right] \\ \amp\leq \amp \max_{\| x \|_2 = 1} \left[ \sum_{j=0}^{n-1} \left[ \max_{i=0}^{n-1} \vert \sigma_i \vert^2 \vert \chi_j \vert^2 \right] \right] \\ \amp = \amp \max_{\| x \|_2 = 1} \left[ \max_{i=0}^{n-1} \vert \sigma_i \vert^2 \sum_{j=0}^{n-1} \vert \chi_j \vert^2 \right] \\ \amp = \amp \left( \max_{i=0}^{n-1} \vert \sigma_i \vert \right)^2 \max_{\| x \|_2 = 1} \| x \|_2^2 \\ \amp=\amp \left( \max_{i=0}^{n-1} \vert \sigma_i \vert \right)^2 . \end{array} \end{equation*}

so that \(\| \Sigma \|_2 \leq \max_{i=0}^{n-1} \vert \sigma_i \vert \text{.}\)

Also, choose \(j \) so that \(\vert \sigma_j \vert = \max_{i=0}^{n-1} \vert \sigma_i \vert \text{.}\) Then

\begin{equation*} \begin{array}{rcl} \| \Sigma \|_2 = \max_{\| x \|_2=1} \| \Sigma x \|_2 \geq \| \Sigma e_j \|_2 = \| \sigma_j e_j \|_2 = \vert \sigma_j \vert \| e_j \|_2 = \vert \sigma_j \vert = \max_{i=0}^{n-1} \vert \sigma_i \vert . \end{array} \end{equation*}

so that \(\max_{i=0}^{n-1} \vert \sigma_i \vert \leq \| \Sigma \|_2 \leq \max_{i=0}^{n-1} \vert \sigma_i \vert \text{,}\) which implies that \(\| \Sigma \|_2 = \max_{i=0}^{n-1} \vert \sigma_i \vert \text{.}\)

Homework 2.3.1.3.

Assume that \(U \in \C^{m \times m} \) and \(V \in \C^{n \times n} \) are unitary matrices. Let \(A, B \in \C^{m \times n} \) with \(B = U A V^H \text{.}\) Show that the singular values of \(A \) equal the singular values of \(B \text{.}\)

Solution

Let \(A = U_A \Sigma_A V_A^H\) be the SVD of \(A \text{.}\) Then \(B = U U_A \Sigma_A V_A^H V^H = ( U U_A ) \Sigma_A (V V_A)^H \) where both \(U U_A \) and \(V V_A \) are unitary. This gives us the SVD for \(B \) and it shows that the singular values of \(B \) equal the singular values of \(A \text{.}\)

Homework 2.3.1.4.

Let \(A \in \C^{m \times n} \) with \(n \leq m \) and \(A = U \Sigma V^H \) be its SVD.

ALWAYS/SOMETIMES/NEVER: \(A^H = V \Sigma^T U^H \text{.}\)

Answer

ALWAYS

Solution
\begin{equation*} A^H = ( U \Sigma V^H )^H = (V^H)^H \Sigma^T U^H = V \Sigma^T U^H \end{equation*}

since \(\Sigma \) is real valued. Notice that \(\Sigma \) is only "sort of diagonal" (it is possibly rectangular) which is why \(\Sigma^T \neq \Sigma \text{.}\)

Homework 2.3.1.5.

Prove the Singular Value Decomposition Theorem for \(m \leq n \text{.}\)

Hint

Consider the SVD of \(B = A^H \)

Solution

Let \(B = A^H \text{.}\) Since it is \(n \times m \) with \(n \geq m \) its SVD exists: \(B = U_B \Sigma_B V_B^H \text{.}\) Then \(A = B^H = V_B \Sigma_B^T U_B^H \) and hence \(A = U \Sigma V^H \) with \(U = V_B \text{,}\) \(\Sigma = \Sigma_B^T \text{,}\) and \(V = U_B \text{.}\)

I believe the following video has material that is better presented in second video of 2.3.2.