next up previous contents
Next: 1.3 Physically Based Matrix Up: 1 Introduction Previous: 1.1 Why a New

1.2 Natural Description of Linear Algebra Algorithms

   

Let us consider the simple example of implementation of the   Cholesky factorization, which we will revisit in Chapter gif. Given a symmetric tex2html_wrap_inline12289 positive definite matrix A , the Cholesky factorization of A is given by

displaymath12285

where L is lower triangular.

The algorithm for implementing this operation can be described by partitioning (blocking) the matrices

displaymath12286

where tex2html_wrap_inline12297 and tex2html_wrap_inline12299 are scalars, and tex2html_wrap_inline12301 and tex2html_wrap_inline12303 are vectors of length n-1 . The tex2html_wrap_inline12307 indicates the symmetric part of A . Now,

eqnarray201

This in turn yields the equations

eqnarray231

We now conclude that the following steps will implement the Cholesky factorization, overwriting the lower triangular portion of A with L :

We make a few observations:
The primary opportunity for parallelism is in the   symmetric rank-1 update: tex2html_wrap_inline12325 .

Using the PLAPACK infrastructure, parallel code for the above algorithm is given by

PLA_Obj_view_all( a, &acur );
while ( TRUE ){
     PLA_Obj_global_length( acur, &size );
     if ( 0 == size ) break;
     PLA_Obj_split_4( acur, 1, 1, &a11, &a12,
                                  &a21, &acur );
     Take_sqrt( a11 );
     PLA_Inv_scal( a11, a21 );
     PLA_Syr( PLA_LOW_TRIAN, min_one, a21, acur );
}
Previewing the rest of the book, all information that describes A , its data, and how it is distributed to processors (nodes) is encoded in an   object (data structure) referenced by a. The PLA_Obj_view_all creates a second reference, acur, into the same data. The PLA_Obj_global_length call extracts the current size of the matrix that acur references. If this is zero, the recursion has completed. The call to PLA_Obj_split_4 partitions the matrix, creating new references for the four quadrants. Element tex2html_wrap_inline12329 is updated by taking its square root in subroutine Take_sqrt, tex2html_wrap_inline12331 is scaled by tex2html_wrap_inline12333 by calling PLA_Inv_scal, and finally the symmetric rank-1 update is achieved through the call to PLA_Syr. This sets up the next iteration, which is also the next level of recursion in our original algorithm.

In order to understand how parallelism is achieved in the above calls, we must first examine how PLAPACK distributes vectors and matrices, in Section 1.3. After this, we show how operations like the symmetric rank-1 update can be naturally parallelized, in Sections 1.5 and 1.6.


next up previous contents
Next: 1.3 Physically Based Matrix Up: 1 Introduction Previous: 1.1 Why a New

rvdg@cs.utexas.edu