Notes on Cholesky Factorization

Notes on Cholesky Factorization

© 2011 by Robert A. van de Geijn
Department of Computer Science
Institute for Computational Engineering and Sciences
The University of Texas at Austin
Remark

Definition and Existence

Definition 1
A matrix \( A \in \mathbb{C}^{m \times m} \) is Hermitian positive definite (HPD) if and only if it is Hermitian (\(A^H = A\)) and for all nonzero vectors \( x \in \mathbb{C}^m \) it is the case that \( x ^H A x > 0 \). If in addition \( A \in \mathbb{R}^{m \times m} \) then \( A \) is said to be symmetric positive definite (SPD).

First some exercises:

Exercise 2
Let \( B \in \mathbb{C}^{m \times n} \) have linearly independent columns. Prove that \( A = B^H B \) is HPD.
Exercise 3
Let \( A \in \mathbb{C}^{m \times m} \). Show that its diagonal elements are real and positive.

We will prove the following theorem later:

Theorem 4 (Cholesky Factorization Theorem)
Given a HPD matrix \( A \) there exists a lower triangular matrix \( L \) such that \( A = L L^H \).

The lower triangular matrix $ L $ is known as the Cholesky factor and \( L L^H \) is known as the Cholesky factorization of \( A \). It is unique if the diagonal elements of \( L \) are restricted to be positive real.

The operation that overwrites the lower triangular part of matrix \( A \) with its Cholesky factor will be denoted by \( A := {\rm Chol}( A ) \), which should be read as ``\( A \) becomes its Cholesky factor.'' Typically, only the lower (or upper) triangular part of \( A \) is stored, and it is that part that is then overwritten with the result. In this discussion, we will assume that the lower triangular part of \( A \) is stored and overwritten.

Application

The Cholesky factorization is used to solve the linear system \( A x = y \) when \( A \) is HPD: Substituting the factors into the equation yields \( L L^H x = y \). Letting \( z = L^H x \), $$ A x = L ( L^H x ) = L z = y . $$ Thus, \( z \) can be computed by solving the triangular system of equations \( L z = y \) and subsequently the desired solution \( x \) can be computed by solving the triangular linear system \( L^H x = z \).

An Algorithm

The most common algorithm for computing $ A := {\rm Chol}(A) $ can be derived as follows:

M-script Code

Code used by Matlab and Octave (a free Matlab) is given by

function [ Aout ] = Chol( A )

% Get the matrix size
n = size( A, 1 ); 

% Loop over the columns of A
for j=1:n

  % alpha11 = sqrt( alpha11)
  A( j, j ) = sqrt( A( j, j ) );

  % a21 = a21 / alpha11
  A( j:1:n, j ) =   A( j:1:n, j ) / A( j,j );

  % A22 = A22 - a21 * a21'  (update only lower triangular part)
  A( j+1:n, j+1:n ) =   A( j+1:n, j+1:n ) - tril( A( j:1:n, j ) * A( j:1:n, j )' );

endfor

Aout = A;

return

or, equivalently,

function [ Aout ] = Chol( A )

% Get the matrix size
n = size( A, 1 ); 

% Loop over the columns of A
for j=1:n

  % alpha11 = sqrt( alpha11)
  A( j, j ) = sqrt( A( j, j ) );

  % a21 = a21 / alpha11
  for i=j+1:n
    A( i, j ) =   A( i, j ) / A( j,j );
  endfor

  % A22 = A22 - a21 * a21'  (update only lower triangular part)
  for k=j+1:n
    for i=k:n
      A( i, k ) =   A( i, k ) - A( i,j ) * A( k,j );
    endfor
  endfor

endfor

Aout = A;

return

Proof of the Cholesky Factorization Theorem

In the below discussion, we again partition \[ A = \left( \begin{array}{c | c} { \alpha_{11} } & { \star } \\ \hline { a_{21} } & { A_{22} } \end{array} \right) \quad \mbox{and} \quad L = \left( \begin{array}{c | c} { \lambda_{11} } & { 0 } \\ \hline { l_{21} } & { L_{22} } \end{array} \right) \]

The following lemmas, which can be found in any standard text, are key to the proof:

Lemma 6
Let \( A \in \mathbb{C}^{nxn} \) be HPD. Then \( \alpha_{11} \) is real and positive.
Proof
This is special case of Exercise 3 .
Lemma 7
Let \( A \in \mathbb{C}^{m \times m} \) be HPD and \( l_{21} = a_{21} / \sqrt{\alpha_{11}} \). Then \( A_{22} - l_{21} l_{21}^H \) is HPD.
Proof

Since \( A \) is Hermitian (clearly) so are \( A_{22} \) and \( A_{22} - l_{21} l_{21}^H \). Let \( x_1 \neq 0 \) be an arbitrary vector of length \( n-1 \). Define \[ x = \left( \begin{array}{c} { \chi_0 } \\ \hline { x_1 } \end{array} \right) \quad \mbox{where } \chi_0 = - a_{21}^H x_1 / \alpha_{11} . \] Then, since \( x \neq 0 \), \[ \begin{array}{r c l} 0 < x^H A x &=& \left( \begin{array}{c} { \chi_0 } \\ \hline { x_1 } \end{array} \right)^H \left( \begin{array}{c|c} { \alpha_{11}} & { a_{21}^H } \\ \hline { a_{21} } & { A_{22} } \end{array} \right) \left( \begin{array}{c} { \chi_0 } \\ \hline { x_1 } \end{array} \right) \\ & = & \left( \begin{array}{c} { \chi_0 } \\ \hline { x_1 } \end{array} \right)^H \left( \begin{array}{c|c} { \alpha_{11} \chi_0 + a_{21}^H x_1 } \\ \hline { a_{21} \chi_0 + A_{22} x_1 } \end{array} \right) \\ &=& \alpha_{11} \vert \chi_0 \vert^2 + \bar \chi_0 a_{21}^H x_1 + x_1^H a_{21} \chi_0 + x_1^H A_{22} x_1 \\ & = & \alpha_{11} \frac{a_{21}^H x_1}{\alpha_{11}} \frac{x_{1}^H a_{21}}{\alpha_{11}} {\color{green} - } \frac{x_{1}^H a_{21}}{\alpha_{11}} a_{21}^H x_1 {\color{green} - } x_1^H a_{21} \frac{a_{21}^H x_1}{\alpha_{11}} + x_1^H A_{22} x_1 \\ &=& x_1^H ( A_{22} - \frac{a_{21} a_{21}^H}{\alpha_{11}} ) x_1 ~~~\mbox{(since \( x_1^H a_{21} a_{21}^H x_1 \) is real and hence equals \( a_{21}^H x_1 x_1^H a_{21} \))}\\ &=& x_1^H ( A_{22} - l_{21} l_{21}^H ) x_1. \end{array} \] We conclude that \( A_{22} - l_{21} l_{21}^H \) is HPD.

Proof of Cholesky Factorization Theorem

Proof by induction.

  • Base case: \( n = 1 \). Clearly the result is true for a \( 1 \times 1 \) matrix \( A = \alpha_{11} \): In this case, the fact that \( A \) is HPD means that \( \alpha_{11} \) is real and positive and a Cholesky factor is then given by \( \lambda_{11} = \sqrt{\alpha_{11}} \), with uniqueness if we insist that \( \lambda_{11} \) is positive.

  • Inductive step: Assume the result is true for HPD matrix \( A \in \mathbb{C}^{(n-1) \times (n-1)} \). We will show that it holds for \( A \in \mathbb{C}^{n \times n} \). Let \( A \in \mathbb{C}^{n \times n} \) be HPD. Let \( \lambda_{11} = \sqrt{\alpha_{11}} \) (which is well-defined by Lemma 6), \( l_{21} = a_{21} / \lambda_{11} \), and \( L_{22} = {\rm Chol}( A_{22} - l_{21} l_{21}^H ) \) (which exists as a consequence of the Inductive Hypothesis and Lemma 7). Then \( L \) is the desired Cholesky factor of \( A \).

  • By the principle of mathematical induction, the theorem holds.
  • A Blocked Algorithm

    In order to attain high performance, the computation is cast in terms of matrix-matrix multiplication by so-called blocked algorithms. For the Cholesky factorization a blocked version of the algorithm can be derived by partitioning \[ A \rightarrow A = \left( \begin{array}{c | c} { A_{11} } & { \star } \\ \hline { A_{21} } & { A_{22} } \end{array} \right) \quad \mbox{and} \quad L \rightarrow \left( \begin{array}{c | c} { L_{11} } & { 0 } \\ \hline { L_{21} } & { L_{22} } \end{array} \right), \] where \( A_{11} \) and \( L_{11} \) are \( b \times b \). By substituting these partitioned matrices into \( A = L L^H \) we find that \[ \left( \begin{array}{c | c} { A_{11} }&{ \star } \\ \hline { A_{21} }&{ A_{22} } \end{array} \right) = \left( \begin{array}{c | c} { L_{11} }&{ 0 } \\ \hline { L_{21} }&{ L_{22} } \end{array} \right) \left( \begin{array}{c | c} { L_{11} }&{ 0 } \\ \hline { L_{21} }&{ L_{22} } \end{array} \right)^H = \left( \begin{array}{c | c} { L_{11} L_{11}^H } & { \star } \\ \hline { L_{21} L_{11}^H } & { L_{21} L_{21}^H + L_{22}^{\phantom{{}^{1}}} L_{22}^H } \end{array} \right). \] From this we conclude that \[ \begin{array}{c | c} { L_{11} = {\rm Chol}( A_{11} ) } & \\ \hline { L_{21} = A_{21} L_{11}^{-H} } & { L_{22}^{\phantom{{}^{1}}} = {\rm Chol}( A_{22} - L_{21} L_{21}^H ) }. \end{array} \] These equalities motivate the algorithm

    Algorithm 8 (Blocked Cholesky factorization)

    1. Partition \( A \rightarrow \left( \begin{array}{c | c} { A_{11} } & { \star } \\ \hline { A_{21} } & { A_{22} } \end{array} \right) \).

    2. Overwrite \( A_{11} := L_{11} = {\rm Chol}( A_{11} ) \).

    3. Overwrite \( A_{21} := L_{21} = A_{21} L_{11}^{-H} \).

    4. Overwrite \( A_{22} := A_{22} - L_{21} L_{21}^H \)
      (updating only the lower triangular part of \( A_{22} \)).

    5. Continue with \( A = A_{22} \). (Back to Step 1.)

    Remark
    The Cholesky factorization \( A_{11} := L_{11} = {\rm Chol}( A_{11} ) \) can be computed with the unblocked algorithm or by calling the blocked Cholesky factorization algorithm recursively. Operations like \( L_{21} = A_{21} L_{11}^{-H} \) are computed by solving the equivalent linear system with multiple right-hand sides \( L_{11} L_{21}^H = A_{21}^H \).

    Cost

    The cost of the Cholesky factorization of \( A \in \mathbb{C}^{m \times m} \) can be analyzed as follows: In Algorithm 5 during the \( k \)th iteration (starting \( k \) at zero) \( A_{00} \) is \( k \times k \). Thus, the operations in that iteration cost

    Thus, the total cost in flops is given by \[ \begin{array}{rcl} C_{\rm Chol}( m )& \approx& \mbox{Cost due to update of $A_{22}$} + \mbox{Cost due to update of $a_{21}$} \\ & = & \sum_{k=0}^{m-1} (m-k-1)^2 + \sum_{k=0}^{m-1} (m-k-1) \\ &=& \sum_{j=0}^{m-1} j^2 + \sum_{j=0}^{m-1} j \approx \frac{1}{3}m^3 + \frac{1}{2} m^2 \approx \frac{1}{3}m^3 \end{array} \] which allows us to state that (obvious) most computation is in the update of \( A_{22} \). In this analysis, we are counting multiplications and additions separately.

    Additional Exercises

    Exercise 9
    Instead of storing the Hermitian matrix \( A \) in the lower triangular part of the matrix, one could store it in the upper triangular part and compute upper triangular matrix \( U \) such that \( A = U^H U \). Propose an algorithm that overwrite the upper triangular part of \( A \) with \( U \).
    Exercise 10
    Let \( A \) be HPD. Partition \[ A = \left( \begin{array}{c | c} {A_{00} } & { \star } \\ \hline { a_{10}^T } & { \alpha_{11} } \end{array} \right) \] where $ \alpha_{11} $ is a scalar and $ a_{10}^T $ denotes the row vector to its left. Prove that $ A_{00} $ is HPD.
    Exercise 11
    Let \( A \) be SPD (and thus real valued). Partition \[ A = \left( \begin{array}{c | c} {A_{00} } & { \star } \\ \hline { a_{10}^T } & { \alpha_{11} } \end{array} \right) \] where $ \alpha_{11} $ is a scalar and $ a_{10}^T $ denotes the row vector to its left. Show that $ \alpha_{11} - a_{10}^T A_{00}^{-1} a_{10} > 0 $. (Hint: mimic the argument in the proof of Lemma 7.)

    Additional Reading

    1. Paolo Bientinesi, Brian Gunter, and Robert van de Geijn. "Families of Algorithms Related to the Inversion of a Symmetric Positive Definite Matrix." ACM Transactions on Mathematical Software 35(1), pp. 1-22, 2008.

      This paper describes a number of unblocked and blocked high-performance algorithm for Cholesky factorization and related operations. While there are many papers written on the subject, this paper uses notation similar to that used in this note.


    Last modified: Jan 13, 2011 . Send comments to rvdg at cs.utexas.edu