ALAFF Induced matrix norms

Subsection 1.3.4 Induced matrix norms

Recall from Subsection 1.3.1 that a matrix, \(A \in \C^{m \times n} \text{,}\) is a 2-dimensional array of numbers that represents a linear transformation, \(L: \C^n \rightarrow \C^m\text{,}\) such that for all \(x \in \C^n \) the matrix-vector multiplication \(A x \) yields the same result as does \(L( x ) \text{.}\)

The question "What is the norm of matrix \(A \text{?}\)" or, equivalently, "How 'large' is \(A\text{?}\)" is the same as asking the question "How 'large' is \(L \text{?}\)" What does this mean? It suggests that what we really want is a measure of how much linear transformation \(L \) or, equivalently, matrix \(A \) "stretches" (magnifies) the "length" of a vector. This observation motivates a class of matrix norms known as induced matrix norms.

Definition 1.3.4.1. Induced matrix norm.

Let \(\| \cdot \|_\mu: \C^m \rightarrow \mathbb R \) and \(\| \cdot \|_\nu: \C^n \rightarrow \mathbb R \) be vector norms. Define \(\| \cdot \|_{\mu,\nu} : \C^{m \times n} \rightarrow \mathbb R \) by

\begin{equation*} \| A \|_{\mu,\nu} = \sup_{ \begin{array}{c} x \in \C^n \\ x \neq 0 \end{array} } \frac{\| A x \|_\mu}{\| x \|_{\nu}}. \end{equation*}

Matrix norms that are defined in this way are said to be induced matrix norms.

Remark 1.3.4.2.

In context, it is obvious (from the column size of the matrix) what the size of vector \(x\) is. For this reason, we will write

\begin{equation*} \| A \|_{\mu,\nu} = \sup_{ \begin{array}{c} x \in \C^n \\ x \neq 0 \end{array} } \frac{\| A x \|_\mu}{\| x \|_{\nu}} \quad \mbox{ as } \quad \| A \|_{\mu,\nu} = \sup_{x \neq 0} \frac{\| A x \|_\mu}{\| x \|_{\nu}}. \end{equation*}

Let us start by interpreting this. How "large" \(A \) is, as measured by \(\| A \|_{\mu,\nu} \text{,}\) is defined as the most that \(A \) magnifies the length of nonzero vectors, where the length of the vector, \(x\text{,}\) is measured with norm \(\| \cdot \|_\nu \) and the length of the transformed vector, \(A x \text{,}\) is measured with norm \(\| \cdot \|_\mu \text{.}\)

Two comments are in order. First,

\begin{equation*} \sup_{ x \neq 0 } \frac{\| A x \|_\mu}{\| x \|_{\nu}} = \sup_{\| x \|_{\nu} = 1} {\| A x \|_\mu}{}. \end{equation*}

This follows from the following sequence of equivalences:

\begin{equation*} \begin{array}{l} \sup_{x \neq 0} \frac{\| A x \|_\mu}{\| x \|_{\nu}} \\ ~~~=~~~~ \lt \mbox{ homogeneity } \gt \\ \sup_{x \neq 0} \| \frac{A x}{\| x \|_{\nu}} \|_\mu \\ ~~~=~~~~ \lt \mbox{ norms are associative }\gt \\ \sup_{x \neq 0} \| A \frac{x}{\| x \|_{\nu}} \|_\mu \\ ~~~=~~~~ \lt \mbox{ substitute } y = x / \| x \|_\nu \gt \\ \sup_{\| y \|_\nu = 1} \| A y \|_\mu. \end{array} \end{equation*}

Second, the "\(\sup\)" (which stands for supremum) is used because we can't claim yet that there is a nonzero vector \(x \) for which

\begin{equation*} \sup_{x \neq 0} \frac{\| A x \|_\mu}{\| x \|_{\nu}} \end{equation*}

is attained or, alternatively, a vector, \(x \text{,}\) with \(\| x \|_\nu = 1 \) for which

\begin{equation*} \sup_{\| x \|_\nu = 1} \| A x \|_\mu \end{equation*}

is attained. In words, it is not immediately obvious that there is a vector for which the supremum is attained. The fact is that there is always such a vector \(x \text{.}\) The proof again depends on a result from real analysis, also employed in Proof 1.2.6.1, that states that \(\sup_{x \in S} f( x ) \) is attained for some vector \(x \in S \) as long as \(f \) is continuous and \(S \) is a compact set. For any norm, \(\| x \| = 1 \) is a compact set. Thus, we can replace \(\sup \) by \(\max \) from here on in our discussion.

We conclude that the following two definitions are equivalent definitions to the one we already gave:

Definition 1.3.4.3.

Let \(\| \cdot \|_\mu: \C^m \rightarrow \mathbb R \) and \(\| \cdot \|_\nu: \C^n \rightarrow \mathbb R \) be vector norms. Define \(\| \cdot \|_{\mu,\nu} : \C^{m \times n} \rightarrow \mathbb R \) by

\begin{equation*} \| A \|_{\mu,\nu} = \max_{x \neq 0} \frac{\| A x \|_\mu}{\| x \|_{\nu}}. \end{equation*}

or, equivalently,

\begin{equation*} \| A \|_{\mu,\nu} = \max_{\| x \|_{\nu}=1} \| A x \|_\mu. \end{equation*}

Remark 1.3.4.4.

In this course, we will often encounter proofs involving norms. Such proofs are much cleaner if one starts by strategically picking the most convenient of these two definitions. Until you gain the intuition needed to pick which one is better, you may have to start your proof using one of them and then switch to the other one if the proof becomes unwieldy.

Theorem 1.3.4.5.

\(\| \cdot \|_{\mu,\nu} : \C^{m \times n} \rightarrow \mathbb R \) is a norm.

Proof.

To prove this, we merely check whether the three conditions are met:

Let \(A, B \in \C^{m \times n} \) and \(\alpha \in \mathbb C \) be arbitrarily chosen. Then

\(A \neq 0 \Rightarrow \| A \|_{\mu,\nu} > 0 \) (\(\| \cdot \|_{\mu,\nu} \) is positive definite):

Notice that \(A \neq 0 \) means that at least one of its columns is not a zero vector (since at least one element is nonzero). Let us assume it is the \(j \)th column, \(a_j \text{,}\) that is nonzero. Let \(e_j \) equal the column of \(I \) (the identity matrix) indexed with \(j \text{.}\) Then

\begin{equation*} \begin{array}{l} \| A \|_{\mu,\nu} \\ ~~~=~~~~\lt \mbox{ definition} \gt \\ \max_{x \neq 0} \frac{\| A x \|_\mu}{\| x \|_{\nu}} \\ ~~~\ge ~~~~\lt e_j \mbox{ ~is~a~specific~vector} \gt \\ \frac{\| A e_j \|_\mu}{\| e_j \|_{\nu}} \\ ~~~=~~~~\lt A e_j = a_j \gt \\ \frac{\| a_j \|_\mu}{\| e_j \|_{\nu}} \\ ~~~\gt~~~~\lt \mbox{ we~assumed~that~} a_j \neq 0 \gt \\ 0. \end{array} \end{equation*}
\(\| \alpha A \|_{\mu,\nu} = \vert \alpha \vert \| A \|_{\mu,\nu} \) (\(\| \cdot \|_{\mu,\nu} \) is homogeneous):

\begin{equation*} \begin{array}{l} \| \alpha A \|_{\mu,\nu} \\ ~~~= ~~~~ \lt \mbox{ definition } \gt \\ \max_{x \neq 0} \frac{\| \alpha A x \|_\mu}{\| x \|_{\nu}} \\ ~~~=~~~~\lt \mbox{ homogeneity} \gt \\ \max_{x \neq 0} \vert \alpha \vert \frac{\| A x \|_\mu}{\| x \|_{\nu}} \\ ~~~=~~~~\lt \mbox{ algebra} \gt \\ \vert \alpha \vert \max_{x \neq 0} \frac{\| A x \|_\mu}{\| x \|_{\nu}} \\ ~~~=~~~~\lt \mbox{ definition} \gt \\ \vert \alpha \vert \| A \|_{\mu,\nu}. \end{array} \end{equation*}
\(\| A + B \|_{\mu,\nu} \leq \| A \|_{\mu,\nu} + \| B \|_{\mu,\nu} \) (\(\| \cdot \|_{\mu,\nu} \) obeys the triangle inequality).

\begin{equation*} \begin{array}{l} \| A + B \|_{\mu,\nu} \\ ~~~=~~~~ \lt \mbox{ definition } \gt \\ \max_{x \neq 0} \frac{\| ( A + B ) x \|_\mu}{\| x \|_{\nu}} \\ ~~~=~~~~ \lt \mbox{ distribute } \gt \\ \max_{x \neq 0} \frac{\| Ax + B x \|_\mu}{\| x \|_{\nu}} \\ ~~~\le~~~~ \lt \mbox{ triangle inequality } \gt \\ \max_{x \neq 0} \frac{\| Ax \|_\mu + \| B x \|_\mu}{\| x \|_{\nu}} \\ ~~~=~~~~ \lt \mbox{ algebra} \gt \\ \max_{x \neq 0} \left( \frac{\| Ax \|_\mu}{\| x \|_{\nu}} + \frac{\| B x \|_\mu}{\| x \|_{\nu}} \right) \\ ~~~\leq~~~~ \lt \mbox{ algebra} \gt \\ \max_{x \neq 0} \frac{\| Ax \|_\mu}{\| x \|_{\nu}} + \max_{x \neq 0} \frac{\| B x \|_\mu}{\| x \|_{\nu}} \\ ~~~=~~~~ \lt \mbox{ definition} \gt \\ \| A \|_{\mu,\nu} + \| B \|_{\mu,\nu} . \end{array} \end{equation*}

When \(\| \cdot \|_\mu \) and \(\| \cdot \|_\nu \) are the same norm (but possibly for different sizes of vectors), the induced norm becomes

Definition 1.3.4.6.

Define \(\| \cdot \|_{\mu} : \C^{m \times n} \rightarrow \mathbb R \) by

\begin{equation*} \| A \|_{\mu} = \max_{x \neq 0} \frac{\| A x \|_\mu}{\| x \|_\mu} \end{equation*}

or, equivalently,

\begin{equation*} \| A \|_{\mu} = \max_{\| x \|_{\mu}=1} \| A x \|_\mu. \end{equation*}

Homework 1.3.4.1.

Consider the vector \(p \)-norm \(\| \cdot \|_p: \Cn \rightarrow \mathbb R \) and let us denote the induced matrix norm by \(\vert\vert\vert \cdot \vert\vert\vert: \mathbb C^{m \times n} \rightarrow \mathbb R \) for this exercise: \(\vert\vert\vert A \vert\vert\vert = \max_{x \neq 0} \frac{\| A x \|_p}{\| x \|_p} \text{.}\)

ALWAYS/SOMETIMES/NEVER: \(\vert\vert\vert y \vert\vert\vert = \| y \|_p \) for \(y \in \Cm \text{.}\)

Answer

ALWAYS

Solution

\begin{equation*} \begin{array}{l} \vert\vert\vert y \vert\vert\vert \\ ~~~ = ~~~~ \lt \mbox{ definition } \gt \\ \max_{x \neq 0} \frac{ \| y x \|_p}{\| x \|_p} \\ ~~~ = ~~~~ \lt x \mbox{ is a scalar since } y \mbox{ is a matrix with one column. Then } \| x \|_p = \| ( \chi_0 ) \|_p = \sqrt[p]{\vert \chi_0 \vert^p} = \vert \chi_0 \vert \gt \\ \max_{\chi_0 \neq 0} \vert \chi_0 \vert \frac{ \| y \|_p}{\vert \chi_0 \vert} \\ ~~~ = ~~~~ \lt \mbox{ algebra } \gt \\ \max_{\chi_0 \neq 0} \| y \|_p \\ ~~~ = ~~~~ \lt \mbox{ algebra } \gt \\ \| y \|_p \end{array} \end{equation*}

This last exercise is important. One can view a vector \(x \in \Cm\) as an \(m \times 1 \) matrix. What this last exercise tells us is that regardless of whether we view \(x \) as a matrix or a vector, \(\| x \|_p \) is the same.

We already encountered the vector \(p \)-norms as an important class of vector norms. The matrix \(p\)-norm is induced by the corresponding vector norm, as defined by

Definition 1.3.4.7. Matrix \(p\)-norm.

For any vector \(p \)-norm, define the corresponding matrix \(p\)-norm \(\| \cdot \|_p : \C^{m \times n} \rightarrow \mathbb R \) by

\begin{equation*} \| A \|_p = \max_{x \neq 0} \frac{\| A x \|_p}{\| x \|_p} \quad \mbox{ or, equivalently,} \quad \| A \|_p = \max_{\| x \|_p = 1} \| A x \|_p. \end{equation*}

Remark 1.3.4.8.

The matrix \(p\)-norms with \(p \in \{ 1, 2, \infty \} \) will play an important role in our course, as will the Frobenius norm. As the course unfolds, we will realize that in practice the matrix 2-norm is of great theoretical importance but difficult to evaluate, except for special matrices. The 1-norm, \(\infty\)-norm, and Frobenius norms are straightforward and relatively cheap to compute (for an \(m \times n \) matrix, computing these costs \(O( mn ) \) computation).