Notes on Cholesky Factorization

© 2011 by Robert A. van de Geijn
Department of Computer Science
Institute for Computational Engineering and Sciences
The University of Texas at Austin

Definition and Existence
Application
An Algorithm
M-script Code
Proof of the Choleskfy Factorization Theorem
A Blocked Algorithm
Cost
Additional Exercises
Additional Reading

Remark

In the below, we treat the more general case where a matrix can have complex valued entries. The exposition can be easily changed to one where the matrix is real valued.

This material is probably best suited to students who have had a course in linear algebra already.

Definition and Existence

Definition 1
A matrix $ A \in \mathbb{C}^{m \times m} $ is Hermitian positive definite (HPD) if and only if it is Hermitian ($A^H = A$) and for all nonzero vectors $ x \in \mathbb{C}^m $ it is the case that $ x ^H A x > 0 $. If in addition $ A \in \mathbb{R}^{m \times m} $ then $ A $ is said to be symmetric positive definite (SPD).

First some exercises:

Exercise 2
Let $ B \in \mathbb{C}^{m \times n} $ have linearly independent columns. Prove that $ A = B^H B $ is HPD.

Exercise 3
Let $ A \in \mathbb{C}^{m \times m} $. Show that its diagonal elements are real and positive.

We will prove the following theorem later:

Theorem 4 (Cholesky Factorization Theorem)
Given a HPD matrix $ A $ there exists a lower triangular matrix $ L $ such that $ A = L L^H $.

The lower triangular matrix $ L $ is known as the Cholesky factor and $ L L^H $ is known as the Cholesky factorization of $ A $. It is unique if the diagonal elements of $ L $ are restricted to be positive real.

The operation that overwrites the lower triangular part of matrix $ A $ with its Cholesky factor will be denoted by $ A := {\rm Chol}( A ) $, which should be read as ``$ A $ becomes its Cholesky factor.'' Typically, only the lower (or upper) triangular part of $ A $ is stored, and it is that part that is then overwritten with the result. In this discussion, we will assume that the lower triangular part of $ A $ is stored and overwritten.

Application

The Cholesky factorization is used to solve the linear system $ A x = y $ when $ A $ is HPD: Substituting the factors into the equation yields $ L L^H x = y $. Letting $ z = L^H x $, $$ A x = L ( L^H x ) = L z = y . $$ Thus, $ z $ can be computed by solving the triangular system of equations $ L z = y $ and subsequently the desired solution $ x $ can be computed by solving the triangular linear system $ L^H x = z $.

An Algorithm

The most common algorithm for computing $ A := {\rm Chol}(A) $ can be derived as follows:

Consider $ A = L L^H $.
Partition \[ A = \left( \begin{array}{c | c} { \alpha_{11} } & { \star } \\ \hline { a_{21} } & { A_{22} } \end{array} \right) \quad \mbox{and} \quad L = \left( \begin{array}{c | c} { \lambda_{11} } & { 0 } \\ \hline { l_{21} } & { L_{22} } \end{array} \right) \]

Remark
We adopt the commonly used notation where Greek lower case letters refer to scalars, lower case letters refer to (column) vectors, and upper case letters refer to matrices. The $ \star $ refers to a part of $ A $ that is neither stored nor updated.

Thus, here $ \alpha_{11} $ refers to the top-left element of matrix $ A $, $ a_{21} $ to the rest of the first column, and $ A_{22} $ to submatrix that is left after removing the first row and column of $ A $.

By substituting these partitioned matrices into $ A = L L^H $ we find that \[ \left( \begin{array}{c | c} { \alpha_{11} } & { \star } \\ \hline { a_{21} } & { A_{22} } \end{array} \right) = \left( \begin{array}{c | c} { \lambda_{11} } & { 0 } \\ \hline { l_{21} } & { L_{22} } \end{array} \right) \left( \begin{array}{c | c} { \lambda_{11} } & { 0 } \\ \hline { l_{21} } & { L_{22} } \end{array} \right)^H = \left( \begin{array}{c | c} { \vert \lambda_{11} \vert^2 } & { \star } \\ \hline { \bar \lambda_{11} l_{21} } & { l_{21} l_{21}^H + L_{22}^{\phantom{{}^{1}}} L_{22}^H } \end{array} \right) \] from which we conclude that \[ \begin{array}{c | c} { \vert \lambda_{11} \vert = \sqrt{ \alpha_{11} } } & \\ \hline { l_{21} = a_{21} / \bar \lambda_{11} } & { L_{22}^{\phantom{{}^{1}}} = {\rm Chol}( A_{22} - l_{21} l_{21}^H )}. \end{array} \]
These equalities motivate the algorithm
Algorithm 5 (Cholesky factorization)
1. Partition $ A \rightarrow \left( \begin{array}{c | c} { \alpha_{11} } & { \star } \\ \hline { a_{21} } & { A_{22} } \end{array} \right) $.
2. Overwrite $ \alpha_{11} := \lambda_{11} = \sqrt{\alpha_{11}} $.
  (Picking $ \lambda_{11} = \sqrt{\alpha_{11}} $ makes it positive and real, and ensures uniqueness.)
3. Overwrite $ a_{21} := l_{21} = a_{21} / \lambda_{11} $.
4. Overwrite $ A_{22} := A_{22} - l_{21} l_{21}^H $
  (updating only the lower triangular part of $ A_{22} $).
5. Continue with $ A = A_{22} $. (Back to Step 1.)

M-script Code

Code used by Matlab and Octave (a free Matlab) is given by

function [ Aout ] = Chol( A )

% Get the matrix size
n = size( A, 1 ); 

% Loop over the columns of A
for j=1:n

  % alpha11 = sqrt( alpha11)
  A( j, j ) = sqrt( A( j, j ) );

  % a21 = a21 / alpha11
  A( j:1:n, j ) =   A( j:1:n, j ) / A( j,j );

  % A22 = A22 - a21 * a21'  (update only lower triangular part)
  A( j+1:n, j+1:n ) =   A( j+1:n, j+1:n ) - tril( A( j:1:n, j ) * A( j:1:n, j )' );

endfor

Aout = A;

return

or, equivalently,

function [ Aout ] = Chol( A )

% Get the matrix size
n = size( A, 1 ); 

% Loop over the columns of A
for j=1:n

  % alpha11 = sqrt( alpha11)
  A( j, j ) = sqrt( A( j, j ) );

  % a21 = a21 / alpha11
  for i=j+1:n
    A( i, j ) =   A( i, j ) / A( j,j );
  endfor

  % A22 = A22 - a21 * a21'  (update only lower triangular part)
  for k=j+1:n
    for i=k:n
      A( i, k ) =   A( i, k ) - A( i,j ) * A( k,j );
    endfor
  endfor

endfor

Aout = A;

return

Proof of the Cholesky Factorization Theorem

In the below discussion, we again partition \[ A = \left( \begin{array}{c | c} { \alpha_{11} } & { \star } \\ \hline { a_{21} } & { A_{22} } \end{array} \right) \quad \mbox{and} \quad L = \left( \begin{array}{c | c} { \lambda_{11} } & { 0 } \\ \hline { l_{21} } & { L_{22} } \end{array} \right) \]

The following lemmas, which can be found in any standard text, are key to the proof:

Lemma 6
Let $ A \in \mathbb{C}^{nxn} $ be HPD. Then $ \alpha_{11} $ is real and positive.

Proof
This is special case of Exercise 3 .

Lemma 7
Let $ A \in \mathbb{C}^{m \times m} $ be HPD and $ l_{21} = a_{21} / \sqrt{\alpha_{11}} $. Then $ A_{22} - l_{21} l_{21}^H $ is HPD.

Proof
Since $ A $ is Hermitian (clearly) so are $ A_{22} $ and $ A_{22} - l_{21} l_{21}^H $. Let $ x_1 \neq 0 $ be an arbitrary vector of length $ n-1 $. Define \[ x = \left( \begin{array}{c} { \chi_0 } \\ \hline { x_1 } \end{array} \right) \quad \mbox{where } \chi_0 = - a_{21}^H x_1 / \alpha_{11} . \] Then, since $ x \neq 0 $, \[ \begin{array}{r c l} 0 < x^H A x &=& \left( \begin{array}{c} { \chi_0 } \\ \hline { x_1 } \end{array} \right)^H \left( \begin{array}{c|c} { \alpha_{11}} & { a_{21}^H } \\ \hline { a_{21} } & { A_{22} } \end{array} \right) \left( \begin{array}{c} { \chi_0 } \\ \hline { x_1 } \end{array} \right) \\ & = & \left( \begin{array}{c} { \chi_0 } \\ \hline { x_1 } \end{array} \right)^H \left( \begin{array}{c|c} { \alpha_{11} \chi_0 + a_{21}^H x_1 } \\ \hline { a_{21} \chi_0 + A_{22} x_1 } \end{array} \right) \\ &=& \alpha_{11} \vert \chi_0 \vert^2 + \bar \chi_0 a_{21}^H x_1 + x_1^H a_{21} \chi_0 + x_1^H A_{22} x_1 \\ & = & \alpha_{11} \frac{a_{21}^H x_1}{\alpha_{11}} \frac{x_{1}^H a_{21}}{\alpha_{11}} {\color{green} - } \frac{x_{1}^H a_{21}}{\alpha_{11}} a_{21}^H x_1 {\color{green} - } x_1^H a_{21} \frac{a_{21}^H x_1}{\alpha_{11}} + x_1^H A_{22} x_1 \\ &=& x_1^H ( A_{22} - \frac{a_{21} a_{21}^H}{\alpha_{11}} ) x_1 ~~~\mbox{(since $ x_1^H a_{21} a_{21}^H x_1 $ is real and hence equals $ a_{21}^H x_1 x_1^H a_{21} $)}\\ &=& x_1^H ( A_{22} - l_{21} l_{21}^H ) x_1. \end{array} \] We conclude that $ A_{22} - l_{21} l_{21}^H $ is HPD.

Proof of Cholesky Factorization Theorem
Proof by induction.

Base case: $ n = 1 $. Clearly the result is true for a $ 1 \times 1 $ matrix $ A = \alpha_{11} $: In this case, the fact that $ A $ is HPD means that $ \alpha_{11} $ is real and positive and a Cholesky factor is then given by $ \lambda_{11} = \sqrt{\alpha_{11}} $, with uniqueness if we insist that $ \lambda_{11} $ is positive.

Inductive step: Assume the result is true for HPD matrix $ A \in \mathbb{C}^{(n-1) \times (n-1)} $. We will show that it holds for $ A \in \mathbb{C}^{n \times n} $. Let $ A \in \mathbb{C}^{n \times n} $ be HPD. Let $ \lambda_{11} = \sqrt{\alpha_{11}} $ (which is well-defined by Lemma 6), $ l_{21} = a_{21} / \lambda_{11} $, and $ L_{22} = {\rm Chol}( A_{22} - l_{21} l_{21}^H ) $ (which exists as a consequence of the Inductive Hypothesis and Lemma 7). Then $ L $ is the desired Cholesky factor of $ A $.

By the principle of mathematical induction, the theorem holds.

A Blocked Algorithm

In order to attain high performance, the computation is cast in terms of matrix-matrix multiplication by so-called blocked algorithms. For the Cholesky factorization a blocked version of the algorithm can be derived by partitioning \[ A \rightarrow A = \left( \begin{array}{c | c} { A_{11} } & { \star } \\ \hline { A_{21} } & { A_{22} } \end{array} \right) \quad \mbox{and} \quad L \rightarrow \left( \begin{array}{c | c} { L_{11} } & { 0 } \\ \hline { L_{21} } & { L_{22} } \end{array} \right), \] where $ A_{11} $ and $ L_{11} $ are $ b \times b $. By substituting these partitioned matrices into $ A = L L^H $ we find that \[ \left( \begin{array}{c | c} { A_{11} }&{ \star } \\ \hline { A_{21} }&{ A_{22} } \end{array} \right) = \left( \begin{array}{c | c} { L_{11} }&{ 0 } \\ \hline { L_{21} }&{ L_{22} } \end{array} \right) \left( \begin{array}{c | c} { L_{11} }&{ 0 } \\ \hline { L_{21} }&{ L_{22} } \end{array} \right)^H = \left( \begin{array}{c | c} { L_{11} L_{11}^H } & { \star } \\ \hline { L_{21} L_{11}^H } & { L_{21} L_{21}^H + L_{22}^{\phantom{{}^{1}}} L_{22}^H } \end{array} \right). \] From this we conclude that \[ \begin{array}{c | c} { L_{11} = {\rm Chol}( A_{11} ) } & \\ \hline { L_{21} = A_{21} L_{11}^{-H} } & { L_{22}^{\phantom{{}^{1}}} = {\rm Chol}( A_{22} - L_{21} L_{21}^H ) }. \end{array} \] These equalities motivate the algorithm

Algorithm 8 (Blocked Cholesky factorization)

Partition $ A \rightarrow \left( \begin{array}{c | c} { A_{11} } & { \star } \\ \hline { A_{21} } & { A_{22} } \end{array} \right) $.

Overwrite $ A_{11} := L_{11} = {\rm Chol}( A_{11} ) $.

Overwrite $ A_{21} := L_{21} = A_{21} L_{11}^{-H} $.

Overwrite $ A_{22} := A_{22} - L_{21} L_{21}^H $
(updating only the lower triangular part of $ A_{22} $).

Continue with $ A = A_{22} $. (Back to Step 1.)

Remark
The Cholesky factorization $ A_{11} := L_{11} = {\rm Chol}( A_{11} ) $ can be computed with the unblocked algorithm or by calling the blocked Cholesky factorization algorithm recursively. Operations like $ L_{21} = A_{21} L_{11}^{-H} $ are computed by solving the equivalent linear system with multiple right-hand sides $ L_{11} L_{21}^H = A_{21}^H $.

Cost

The cost of the Cholesky factorization of $ A \in \mathbb{C}^{m \times m} $ can be analyzed as follows: In Algorithm 5 during the $ k $th iteration (starting $ k $ at zero) $ A_{00} $ is $ k \times k $. Thus, the operations in that iteration cost

$ \alpha_{11} := \sqrt{ \alpha_{11} } $: negligible when $ k $ is large.
$ a_{21} := a_{21} / \alpha_{11} $: approximately $ (m-k-1) $ flops.
(This is usually implemented as a computation of $ \beta = 1 / \alpha_{11} $ followed by $ a_{21} := \beta a_{21} $.)
$ A_{22} := A_{22} - a_{21} a_{21}^H $: approximately $ (m-k-1)^2 $ flops. (A rank-1 update of all of $ A_{22} $ would have cost $ 2 (m-k-1)^2 $ flops. Approximately half the entries of $ A_{22} $ are updated.)

Thus, the total cost in flops is given by \[ \begin{array}{rcl} C_{\rm Chol}( m )& \approx& \mbox{Cost due to update of $A_{22}$} + \mbox{Cost due to update of $a_{21}$} \\ & = & \sum_{k=0}^{m-1} (m-k-1)^2 + \sum_{k=0}^{m-1} (m-k-1) \\ &=& \sum_{j=0}^{m-1} j^2 + \sum_{j=0}^{m-1} j \approx \frac{1}{3}m^3 + \frac{1}{2} m^2 \approx \frac{1}{3}m^3 \end{array} \] which allows us to state that (obvious) most computation is in the update of $ A_{22} $. In this analysis, we are counting multiplications and additions separately.

Additional Exercises

Exercise 9
Instead of storing the Hermitian matrix $ A $ in the lower triangular part of the matrix, one could store it in the upper triangular part and compute upper triangular matrix $ U $ such that $ A = U^H U $. Propose an algorithm that overwrite the upper triangular part of $ A $ with $ U $.

Exercise 10
Let $ A $ be HPD. Partition \[ A = \left( \begin{array}{c | c} {A_{00} } & { \star } \\ \hline { a_{10}^T } & { \alpha_{11} } \end{array} \right) \] where $ \alpha_{11} $ is a scalar and $ a_{10}^T $ denotes the row vector to its left. Prove that $ A_{00} $ is HPD.

Exercise 11
Let $ A $ be SPD (and thus real valued). Partition \[ A = \left( \begin{array}{c | c} {A_{00} } & { \star } \\ \hline { a_{10}^T } & { \alpha_{11} } \end{array} \right) \] where $ \alpha_{11} $ is a scalar and $ a_{10}^T $ denotes the row vector to its left. Show that $ \alpha_{11} - a_{10}^T A_{00}^{-1} a_{10} > 0 $. (Hint: mimic the argument in the proof of Lemma 7.)

Additional Reading

Paolo Bientinesi, Brian Gunter, and Robert van de Geijn. "Families of Algorithms Related to the Inversion of a Symmetric Positive Definite Matrix." ACM Transactions on Mathematical Software 35(1), pp. 1-22, 2008.
This paper describes a number of unblocked and blocked high-performance algorithm for Cholesky factorization and related operations. While there are many papers written on the subject, this paper uses notation similar to that used in this note.

Last modified: Jan 13, 2011 . Send comments to rvdg at cs.utexas.edu