Remark
- In the below, we treat the more general case where a matrix can have complex valued entries. The exposition can be easily changed to one where the matrix is real valued.
- This material is probably best suited to students who have had a course in linear algebra already.
Definition 1
A matrix \( A \in \mathbb{C}^{m \times m} \) is Hermitian positive definite (HPD) if and only if it is Hermitian (\(A^H = A\)) and for all nonzero vectors \( x \in \mathbb{C}^m \) it is the case that \( x ^H A x > 0 \). If in addition \( A \in \mathbb{R}^{m \times m} \) then \( A \) is said to be symmetric positive definite (SPD).
First some exercises:
Exercise 2
Let \( B \in \mathbb{C}^{m \times n} \) have linearly independent columns. Prove that \( A = B^H B \) is HPD.
Exercise 3
Let \( A \in \mathbb{C}^{m \times m} \). Show that its diagonal elements are real and positive.
We will prove the following theorem later:
Theorem 4 (Cholesky Factorization Theorem)
Given a HPD matrix \( A \) there exists a lower triangular matrix \( L \) such that \( A = L L^H \).
The lower triangular matrix $ L $ is known as the Cholesky factor and \( L L^H \) is known as the Cholesky factorization of \( A \). It is unique if the diagonal elements of \( L \) are restricted to be positive real.
The operation that overwrites the lower triangular part of matrix \( A \) with its Cholesky factor will be denoted by \( A := {\rm Chol}( A ) \), which should be read as ``\( A \) becomes its Cholesky factor.'' Typically, only the lower (or upper) triangular part of \( A \) is stored, and it is that part that is then overwritten with the result. In this discussion, we will assume that the lower triangular part of \( A \) is stored and overwritten.
The Cholesky factorization is used to solve the linear system \( A x = y \) when \( A \) is HPD: Substituting the factors into the equation yields \( L L^H x = y \). Letting \( z = L^H x \), $$ A x = L ( L^H x ) = L z = y . $$ Thus, \( z \) can be computed by solving the triangular system of equations \( L z = y \) and subsequently the desired solution \( x \) can be computed by solving the triangular linear system \( L^H x = z \).
The most common algorithm for computing $ A := {\rm Chol}(A) $ can be derived as follows:
RemarkWe adopt the commonly used notation where Greek lower case letters refer to scalars, lower case letters refer to (column) vectors, and upper case letters refer to matrices. The \( \star \) refers to a part of \( A \) that is neither stored nor updated.
Thus, here \( \alpha_{11} \) refers to the top-left element of matrix $ A $, $ a_{21} $ to the rest of the first column, and $ A_{22} $ to submatrix that is left after removing the first row and column of $ A $.
Algorithm 5 (Cholesky factorization)
- Partition \( A \rightarrow \left( \begin{array}{c | c} { \alpha_{11} } & { \star } \\ \hline { a_{21} } & { A_{22} } \end{array} \right) \).
- Overwrite \( \alpha_{11} := \lambda_{11} = \sqrt{\alpha_{11}} \).
(Picking \( \lambda_{11} = \sqrt{\alpha_{11}} \) makes it positive and real, and ensures uniqueness.)
- Overwrite \( a_{21} := l_{21} = a_{21} / \lambda_{11} \).
- Overwrite \( A_{22} := A_{22} - l_{21} l_{21}^H \)
(updating only the lower triangular part of \( A_{22} \)).
- Continue with \( A = A_{22} \). (Back to Step 1.)
Code used by Matlab and Octave (a free Matlab) is given by
function [ Aout ] = Chol( A ) % Get the matrix size n = size( A, 1 ); % Loop over the columns of A for j=1:n % alpha11 = sqrt( alpha11) A( j, j ) = sqrt( A( j, j ) ); % a21 = a21 / alpha11 A( j:1:n, j ) = A( j:1:n, j ) / A( j,j ); % A22 = A22 - a21 * a21' (update only lower triangular part) A( j+1:n, j+1:n ) = A( j+1:n, j+1:n ) - tril( A( j:1:n, j ) * A( j:1:n, j )' ); endfor Aout = A; return
or, equivalently,
function [ Aout ] = Chol( A ) % Get the matrix size n = size( A, 1 ); % Loop over the columns of A for j=1:n % alpha11 = sqrt( alpha11) A( j, j ) = sqrt( A( j, j ) ); % a21 = a21 / alpha11 for i=j+1:n A( i, j ) = A( i, j ) / A( j,j ); endfor % A22 = A22 - a21 * a21' (update only lower triangular part) for k=j+1:n for i=k:n A( i, k ) = A( i, k ) - A( i,j ) * A( k,j ); endfor endfor endfor Aout = A; return
In the below discussion, we again partition \[ A = \left( \begin{array}{c | c} { \alpha_{11} } & { \star } \\ \hline { a_{21} } & { A_{22} } \end{array} \right) \quad \mbox{and} \quad L = \left( \begin{array}{c | c} { \lambda_{11} } & { 0 } \\ \hline { l_{21} } & { L_{22} } \end{array} \right) \]
The following lemmas, which can be found in any standard text, are key to the proof:
Lemma 6
Let \( A \in \mathbb{C}^{nxn} \) be HPD. Then \( \alpha_{11} \) is real and positive.
Proof
This is special case of Exercise 3 .
Lemma 7
Let \( A \in \mathbb{C}^{m \times m} \) be HPD and \( l_{21} = a_{21} / \sqrt{\alpha_{11}} \). Then \( A_{22} - l_{21} l_{21}^H \) is HPD.
ProofSince \( A \) is Hermitian (clearly) so are \( A_{22} \) and \( A_{22} - l_{21} l_{21}^H \). Let \( x_1 \neq 0 \) be an arbitrary vector of length \( n-1 \). Define \[ x = \left( \begin{array}{c} { \chi_0 } \\ \hline { x_1 } \end{array} \right) \quad \mbox{where } \chi_0 = - a_{21}^H x_1 / \alpha_{11} . \] Then, since \( x \neq 0 \), \[ \begin{array}{r c l} 0 < x^H A x &=& \left( \begin{array}{c} { \chi_0 } \\ \hline { x_1 } \end{array} \right)^H \left( \begin{array}{c|c} { \alpha_{11}} & { a_{21}^H } \\ \hline { a_{21} } & { A_{22} } \end{array} \right) \left( \begin{array}{c} { \chi_0 } \\ \hline { x_1 } \end{array} \right) \\ & = & \left( \begin{array}{c} { \chi_0 } \\ \hline { x_1 } \end{array} \right)^H \left( \begin{array}{c|c} { \alpha_{11} \chi_0 + a_{21}^H x_1 } \\ \hline { a_{21} \chi_0 + A_{22} x_1 } \end{array} \right) \\ &=& \alpha_{11} \vert \chi_0 \vert^2 + \bar \chi_0 a_{21}^H x_1 + x_1^H a_{21} \chi_0 + x_1^H A_{22} x_1 \\ & = & \alpha_{11} \frac{a_{21}^H x_1}{\alpha_{11}} \frac{x_{1}^H a_{21}}{\alpha_{11}} {\color{green} - } \frac{x_{1}^H a_{21}}{\alpha_{11}} a_{21}^H x_1 {\color{green} - } x_1^H a_{21} \frac{a_{21}^H x_1}{\alpha_{11}} + x_1^H A_{22} x_1 \\ &=& x_1^H ( A_{22} - \frac{a_{21} a_{21}^H}{\alpha_{11}} ) x_1 ~~~\mbox{(since \( x_1^H a_{21} a_{21}^H x_1 \) is real and hence equals \( a_{21}^H x_1 x_1^H a_{21} \))}\\ &=& x_1^H ( A_{22} - l_{21} l_{21}^H ) x_1. \end{array} \] We conclude that \( A_{22} - l_{21} l_{21}^H \) is HPD.
Proof of Cholesky Factorization TheoremProof by induction.
Base case: \( n = 1 \). Clearly the result is true for a \( 1 \times 1 \) matrix \( A = \alpha_{11} \): In this case, the fact that \( A \) is HPD means that \( \alpha_{11} \) is real and positive and a Cholesky factor is then given by \( \lambda_{11} = \sqrt{\alpha_{11}} \), with uniqueness if we insist that \( \lambda_{11} \) is positive.
Inductive step: Assume the result is true for HPD matrix \( A \in \mathbb{C}^{(n-1) \times (n-1)} \). We will show that it holds for \( A \in \mathbb{C}^{n \times n} \). Let \( A \in \mathbb{C}^{n \times n} \) be HPD. Let \( \lambda_{11} = \sqrt{\alpha_{11}} \) (which is well-defined by Lemma 6), \( l_{21} = a_{21} / \lambda_{11} \), and \( L_{22} = {\rm Chol}( A_{22} - l_{21} l_{21}^H ) \) (which exists as a consequence of the Inductive Hypothesis and Lemma 7). Then \( L \) is the desired Cholesky factor of \( A \).
By the principle of mathematical induction, the theorem holds.
In order to attain high performance, the computation is cast in terms of matrix-matrix multiplication by so-called blocked algorithms. For the Cholesky factorization a blocked version of the algorithm can be derived by partitioning \[ A \rightarrow A = \left( \begin{array}{c | c} { A_{11} } & { \star } \\ \hline { A_{21} } & { A_{22} } \end{array} \right) \quad \mbox{and} \quad L \rightarrow \left( \begin{array}{c | c} { L_{11} } & { 0 } \\ \hline { L_{21} } & { L_{22} } \end{array} \right), \] where \( A_{11} \) and \( L_{11} \) are \( b \times b \). By substituting these partitioned matrices into \( A = L L^H \) we find that \[ \left( \begin{array}{c | c} { A_{11} }&{ \star } \\ \hline { A_{21} }&{ A_{22} } \end{array} \right) = \left( \begin{array}{c | c} { L_{11} }&{ 0 } \\ \hline { L_{21} }&{ L_{22} } \end{array} \right) \left( \begin{array}{c | c} { L_{11} }&{ 0 } \\ \hline { L_{21} }&{ L_{22} } \end{array} \right)^H = \left( \begin{array}{c | c} { L_{11} L_{11}^H } & { \star } \\ \hline { L_{21} L_{11}^H } & { L_{21} L_{21}^H + L_{22}^{\phantom{{}^{1}}} L_{22}^H } \end{array} \right). \] From this we conclude that \[ \begin{array}{c | c} { L_{11} = {\rm Chol}( A_{11} ) } & \\ \hline { L_{21} = A_{21} L_{11}^{-H} } & { L_{22}^{\phantom{{}^{1}}} = {\rm Chol}( A_{22} - L_{21} L_{21}^H ) }. \end{array} \] These equalities motivate the algorithm
Algorithm 8 (Blocked Cholesky factorization)
- Partition \( A \rightarrow \left( \begin{array}{c | c} { A_{11} } & { \star } \\ \hline { A_{21} } & { A_{22} } \end{array} \right) \).
- Overwrite \( A_{11} := L_{11} = {\rm Chol}( A_{11} ) \).
- Overwrite \( A_{21} := L_{21} = A_{21} L_{11}^{-H} \).
- Overwrite \( A_{22} := A_{22} - L_{21} L_{21}^H \)
(updating only the lower triangular part of \( A_{22} \)).
- Continue with \( A = A_{22} \). (Back to Step 1.)
Remark
The Cholesky factorization \( A_{11} := L_{11} = {\rm Chol}( A_{11} ) \) can be computed with the unblocked algorithm or by calling the blocked Cholesky factorization algorithm recursively. Operations like \( L_{21} = A_{21} L_{11}^{-H} \) are computed by solving the equivalent linear system with multiple right-hand sides \( L_{11} L_{21}^H = A_{21}^H \).
The cost of the Cholesky factorization of \( A \in \mathbb{C}^{m \times m} \) can be analyzed as follows: In Algorithm 5 during the \( k \)th iteration (starting \( k \) at zero) \( A_{00} \) is \( k \times k \). Thus, the operations in that iteration cost
Thus, the total cost in flops is given by \[ \begin{array}{rcl} C_{\rm Chol}( m )& \approx& \mbox{Cost due to update of $A_{22}$} + \mbox{Cost due to update of $a_{21}$} \\ & = & \sum_{k=0}^{m-1} (m-k-1)^2 + \sum_{k=0}^{m-1} (m-k-1) \\ &=& \sum_{j=0}^{m-1} j^2 + \sum_{j=0}^{m-1} j \approx \frac{1}{3}m^3 + \frac{1}{2} m^2 \approx \frac{1}{3}m^3 \end{array} \] which allows us to state that (obvious) most computation is in the update of \( A_{22} \). In this analysis, we are counting multiplications and additions separately.
Exercise 9
Instead of storing the Hermitian matrix \( A \) in the lower triangular part of the matrix, one could store it in the upper triangular part and compute upper triangular matrix \( U \) such that \( A = U^H U \). Propose an algorithm that overwrite the upper triangular part of \( A \) with \( U \).
Exercise 10
Let \( A \) be HPD. Partition \[ A = \left( \begin{array}{c | c} {A_{00} } & { \star } \\ \hline { a_{10}^T } & { \alpha_{11} } \end{array} \right) \] where $ \alpha_{11} $ is a scalar and $ a_{10}^T $ denotes the row vector to its left. Prove that $ A_{00} $ is HPD.
Exercise 11
Let \( A \) be SPD (and thus real valued). Partition \[ A = \left( \begin{array}{c | c} {A_{00} } & { \star } \\ \hline { a_{10}^T } & { \alpha_{11} } \end{array} \right) \] where $ \alpha_{11} $ is a scalar and $ a_{10}^T $ denotes the row vector to its left. Show that $ \alpha_{11} - a_{10}^T A_{00}^{-1} a_{10} > 0 $. (Hint: mimic the argument in the proof of Lemma 7.)
This paper describes a number of unblocked and blocked high-performance algorithm for Cholesky factorization and related operations. While there are many papers written on the subject, this paper uses notation similar to that used in this note.