ALAFF Of linear transformations and matrices

Subsection 1.3.1 Of linear transformations and matrices

We briefly review the relationship between linear transformations and matrices, which is key to understanding why linear algebra is all about matrices and vectors.

Definition 1.3.1.1. Linear transformations and matrices.

Let \(L : \Cn \rightarrow \Cm \text{.}\) Then \(L \) is said to be a linear transformation if for all \(\alpha \in \mathbb C \) and \(x, y \in \Cn \)

\(L( \alpha x ) = \alpha L( x ) \text{.}\) That is, scaling first and then transforming yields the same result as transforming first and then scaling.
\(L( x + y ) = L( x ) + L( y ) \text{.}\) That is, adding first and then transforming yields the same result as transforming first and then adding.

The importance of linear transformations comes in part from the fact that many problems in science boil down to, given a function \(F: \Cn \rightarrow \Cm \) and vector \(y \in \Cm \text{,}\) find \(x \) such that \(F( x ) = y \text{.}\) This is known as an inverse problem. Under mild conditions, \(F \) can be locally approximated with a linear transformation \(L \) and then, as part of a solution method, one would want to solve \(L x = y \text{.}\)

The following theorem provides the link between linear transformations and matrices:

Theorem 1.3.1.2.

Let \(L: \Cn \rightarrow \Cm \) be a linear transformation, \(v_0, v_1, \cdots, v_{k-1} \in \Cn \text{,}\) and \(x \in \C^k \text{.}\) Then

\begin{equation*} L( \chi_0 v_0 + \chi_1 v_1 + \cdots + \chi_{k-1} v_{k-1} ) = \chi_0 L( v_0 ) + \chi_1 L( v_1 ) + \cdots + \chi_{k-1} L( v_{k-1} ) , \end{equation*}

where

\begin{equation*} x = \left( \begin{array}{c} \chi_0 \\ \vdots \\ \chi_{k-1} \end{array} \right). \end{equation*}

Proof.

A simple inductive proof yields the result. For details, see Week 2 of Linear Algebra: Foundations to Frontiers (LAFF) [27].

The following set of vectors ends up playing a crucial role throughout this course:

Definition 1.3.1.3. Standard basis vector.

In this course, we will use \(e_j \in \Cm \) to denote the standard basis vector with a "1" in the position indexed with \(j \text{.}\) So,

\begin{equation*} e_j = \left( \begin{array}{c} 0 \\ \vdots \\ 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{array} \right) \begin{array}{c} \phantom{0} \\ \phantom{\vdots} \\ \phantom{0} \\ \longleftarrow j \\ \phantom{0} \\ \phantom{\vdots} \\ \phantom{0} \end{array} \end{equation*}

Key is the fact that any vector \(x \in \Cn \) can be written as a linear combination of the standard basis vectors of \(\Cn \text{:}\)

\begin{equation*} \begin{array}{rcl} x \amp = \amp \left( \begin{array}{c} \chi_0 \\ \chi_1 \\ \vdots \\ \chi_{n-1} \end{array} \right) = \chi_0 \left( \begin{array}{c} 1 \\ 0 \\ \vdots \\ 0 \end{array} \right) + \chi_1 \left( \begin{array}{c} 0 \\ 1 \\ \vdots \\ 0 \end{array} \right) + \cdots + \chi_{n-1} \left( \begin{array}{c} 0 \\ 0 \\ \vdots \\ 1 \end{array} \right) \\ \amp = \amp \chi_0 e_0 + \chi_1 e_1 + \cdots + \chi_{n-1} e_{n-1}. \end{array} \end{equation*}

Hence, if \(L \) is a linear transformation,

\begin{equation*} \begin{array}{rcl} L( x ) \amp = \amp L( \chi_0 e_0 + \chi_1 e_1 + \cdots + \chi_{n-1} e_{n-1} ) \\ \amp= \amp \chi_0 \begin{array}[t]{c} \underbrace{ L( e_0 )} \\ a_0 \end{array} + \chi_1 \begin{array}[t]{c} \underbrace{ L( e_1 )} \\ a_1 \end{array} + \cdots + \chi_{n-1} \begin{array}[t]{c} \underbrace{ L( e_{n-1} )} \\ a_{n-1} \end{array} . \end{array} \end{equation*}

If we now let \(a_j = L( e_j ) \) (the vector \(a_j \) is the transformation of the standard basis vector \(e_j \) and collect these vectors into a two-dimensional array of numbers:

\begin{equation} A = \left( \begin{array}{c | c | c | c} a_0 \amp a_1 \amp \cdots \amp a_{n-1} \end{array} \right)\label{chapter01-matrix-by-columns}\tag{1.3.1} \end{equation}

then we notice that information for evaluating \(L( x )\) can be found in this array, since \(L\) can then alternatively be computed by

\begin{equation*} L( x ) = \chi_0 a_0 + \chi_1 a_1 + \cdots + \chi_{n-1} a_{n-1}. \end{equation*}

The array \(A \) in (1.3.1) we call a matrix and the operation \(A x = \chi_0 a_0 + \chi_1 a_1 + \cdots + \chi_{n-1} a_{n-1} \) we call matrix-vector multiplication. Clearly

\begin{equation*} A x = L( x ). \end{equation*}

Remark 1.3.1.4. Notation.

In these notes, as a rule,

Roman upper case letters are used to denote matrices.
Roman lower case letters are used to denote vectors.
Greek lower case letters are used to denote scalars.

Corresponding letters from these three sets are used to refer to a matrix, the row or columns of that matrix, and the elements of that matrix. If \(A \in \mathbb C^{m \times n} \) then

\begin{equation*} \begin{array}{l} A \\ ~~~ = ~~~~ \lt \mbox{ partition } A \mbox{ by columns and rows } \gt \\ \left( \begin{array}{c| c | c | c} a_0 \amp a_1 \amp \cdots \amp a_{n-1} \end{array} \right) = \left( \begin{array}{c} \widetilde a_0^T \\ \hline \widetilde a_1^T \\ \hline \vdots \\ \hline \widetilde a_{m-1}^T \end{array} \right) \\ ~~~ = ~~~~ \lt \mbox{ expose the elements of } A \gt \\ \left( \begin{array}{c| c | c | c} \alpha_{0,0} \amp \alpha_{0,1} \amp \cdots \amp \alpha_{0,n-1} \\ \hline \alpha_{1,0} \amp \alpha_{1,1} \amp \cdots \amp \alpha_{1,n-1} \\ \hline \vdots \amp \vdots \amp \amp \vdots \\ \vdots \\ \hline \alpha_{m-1,0} \amp \alpha_{m-1,1} \amp \cdots \amp \alpha_{m-1,n-1} \ \end{array} \right) \end{array} \end{equation*}

We now notice that the standard basis vector \(e_j \in \Cm \) equals the column of the \(m \times m \) identity matrix indexed with \(j \text{:}\)

\begin{equation*} I = \left( \begin{array}{c | c | c | c} 1 \amp 0 \amp \cdots \amp 0 \\ 0 \amp 1 \amp \cdots \amp 0 \\ \vdots \amp \vdots \amp \ddots \amp \vdots \\ 0 \amp 0 \amp \cdots \amp 1 \end{array} \right) = \left( \begin{array}{c | c | c | c} e_0 \amp e_1 \amp \cdots \amp e_{m-1} \end{array} \right) = \left( \begin{array}{c} \widetilde e_0^T \\ \hline \widetilde e_1^T \\ \hline \vdots \\ \hline \widetilde e_{m-1}^T \end{array} \right) . \end{equation*}

Remark 1.3.1.5.

The important thing to note is that a matrix is a convenient representation of a linear transformation and matrix-vector multiplication is an alternative way for evaluating that linear transformation.

Let's investigate matrix-matrix multiplication and its relationship to linear transformations. Consider two linear transformations

\begin{equation*} \begin{array}{ll} L_A: \Ck \rightarrow \Cm \amp \mbox{represented by } \mbox{ matrix } A \\ L_B: \Cn \rightarrow \Ck \amp \mbox{represented by } \mbox{ matrix } B \end{array} \end{equation*}

and define

\begin{equation*} L_C( x ) = L_A ( L_B( x ) ), \end{equation*}

as the composition of \(L_A\) and \(L_B \text{.}\) Then it can be easily shown that \(L_C \) is also a linear transformation. Let \(m \times n\) matrix \(C \) represent \(L_C \text{.}\) How are \(A \text{,}\) \(B \text{,}\) and \(C \) related? If we let \(c_j \) equal the column of \(C \) indexed with \(j \text{,}\) then because of the link between matrices, linear transformations, and standard basis vectors

\begin{equation*} c_j = L_C( e_j )= L_A( L_B ( e_j ) ) = L_A( b_j ) = A b_j , \end{equation*}

where \(b_j \) equals the column of \(B \) indexed with \(j \text{.}\) Now, we say that \(C = A B \) is the product of \(A \) and \(B \) defined by

\begin{equation*} \left( \begin{array}{c | c | c | c} c_0 \amp c_1 \amp \cdots \amp c_{n-1} \end{array} \right) = A \left( \begin{array}{c | c | c | c} b_0 \amp b_1 \amp \cdots \amp b_{n-1} \end{array} \right) = \left( \begin{array}{c | c | c | c} A b_0 \amp A b_1 \amp \cdots \amp A b_{n-1} \end{array} \right) \end{equation*}

and define the matrix-matrix multiplication as the operation that computes

\begin{equation*} C := A B, \end{equation*}

which you will want to pronounce "C becomes A times B" to distinguish assignment from equality. If you think carefully how individual elements of \(C \) are computed, you will realize that they equal the usual "dot product of rows of \(A \) with columns of \(B \text{.}\)"

As already mentioned, throughout this course, it will be important that you can think about matrices in terms of their columns and rows, and matrix-matrix multiplication (and other operations with matrices and vectors) in terms of columns and rows. It is also important to be able to think about matrix-matrix multiplication in three different ways. If we partition each matrix by rows and by columns:

\begin{equation*} C = \left( \begin{array}{c | c | c} c_0 \amp \cdots \amp c_{n-1} \end{array} \right) = \left( \begin{array}{c} \widetilde c_0^T \\ \hline \vdots \\ \hline \widetilde c_{m-1}^T \end{array} \right), A = \left( \begin{array}{c | c | c} a_0 \amp \cdots \amp a_{k-1} \end{array} \right) = \left( \begin{array}{c} \widetilde a_0^T \\ \hline \vdots \\ \hline \widetilde a_{m-1}^T \end{array} \right), \end{equation*}

and

\begin{equation*} B = \left( \begin{array}{c | c | c} b_0 \amp \cdots \amp b_{n-1} \end{array} \right) = \left( \begin{array}{c} \widetilde b_0^T \\ \hline \vdots \\ \hline \widetilde b_{k-1}^T \end{array} \right), \end{equation*}

then \(C := A B \) can be computed in the following ways:

By columns:

\begin{equation*} \left( \begin{array}{c | c | c} c_0 \amp \cdots \amp c_{n-1} \end{array} \right) := A \left( \begin{array}{c | c | c} b_0 \amp \cdots \amp b_{n-1} \end{array} \right) = \left( \begin{array}{c | c | c} A b_0 \amp \cdots \amp A b_{n-1} \end{array} \right). \end{equation*}

In other words, \(c_j := A b_j \) for all columns of \(C \text{.}\)
By rows:

\begin{equation*} \left( \begin{array}{c} \widetilde c_0^T \\ \hline \vdots \\ \hline \widetilde c_{m-1}^T \end{array} \right) := \left( \begin{array}{c} \widetilde a_0^T \\ \hline \vdots \\ \hline \widetilde a_{m-1}^T \end{array} \right) B = \left( \begin{array}{c} \widetilde a_0^T B \\ \hline \vdots \\ \hline \widetilde a_{m-1}^T B \end{array} \right). \end{equation*}

In other words, \(\widetilde c_i^T = \widetilde a_i^T B\) for all rows of \(C \text{.}\)
One you may not have thought about much before:

\begin{equation*} C := \left( \begin{array}{c | c | c} a_0 \amp \cdots \amp a_{k-1} \end{array} \right) \left( \begin{array}{c} \widetilde b_0^T \\ \hline \vdots \\ \hline \widetilde b_{k-1}^T \end{array} \right) = a_0 \widetilde b_0^T + \cdots + a_{k-1} \widetilde b_{k-1} ^T, \end{equation*}

which should be thought of as a sequence of rank-1 updates, since each term is an outer product and an outer product has rank of at most one.

These three cases are special cases of the more general observation that, if we can partition \(C \text{,}\) \(A \text{,}\) and \(B \) by blocks (submatrices),

\begin{equation*} C = \left( \begin{array}{c | c | c} C_{0,0} \amp \cdots \amp C_{0,N-1} \\ \hline \vdots \amp \amp \vdots \\ \hline C_{M-1,0} \amp \cdots \amp C_{M-1,N-1} \end{array} \right), \left( \begin{array}{c | c | c} A_{0,0} \amp \cdots \amp A_{0,K-1} \\ \hline \vdots \amp \amp \vdots \\ \hline A_{M-1,0} \amp \cdots \amp A_{M-1,K-1} \end{array} \right), \end{equation*}

and

\begin{equation*} \left( \begin{array}{c | c | c} B_{0,0} \amp \cdots \amp B_{0,N-1} \\ \hline \vdots \amp \amp \vdots \\ \hline B_{K-1,0} \amp \cdots \amp B_{K-1,N-1} \end{array} \right), \end{equation*}

where the partitionings are "conformal", then

\begin{equation*} C_{i,j} = \sum_{p=0}^{K-1} A_{i,p} B_{p,j} . \end{equation*}

Remark 1.3.1.6.

If the above review of linear transformations, matrices, matrix-vector multiplication, and matrix-matrix multiplication makes you exclaim "That is all a bit too fast for me!" then it is time for you to take a break and review Weeks 2-5 of our introductory linear algebra course "Linear Algebra: Foundations to Frontiers." Information, including notes [27] (optionally downloadable for free) and a link to the course on edX [28] (which can be audited for free) can be found at http://ulaff.net.