SELECTED PLAPACK IMPLEMENTATIONS OF MATRIX OPERATIONS
Index

Parallel Level2 BLAS:
 PLA_Gemv: Parallel General MatrixMatrix Multiplication

Parallel Level3 BLAS:
 PLA_Gemm: Parallel General MatrixVector Multiplication

Parallel Factorization Routines:
 PLA_Chol: Parallel Cholesky Factorization
PLA_Gemv: General MatrixVector Multiplication
The best way to justify the Abstract Programming Interface used by
PLAPACK is to look at how a parallel implementation code looks
if it is coded in a more traditional fashion.
Look at the corresponding
ScaLAPACK code:

pdgemv_.c: ScaLAPACK Parallel BLAS (PBLAS) routine

pbdgemv.f:
ScaLAPACK Parallel Blocked BLAS (PBBLAS) routine
References:
PLA_Gemm: General MatrixMatrix Multiplication

Main routine:
PLA_Gemm.c
This routine chooses between three different routines, depending
on the shapes of the matrices involved:
 If matrix C contains most data, it is left in place and
A and B are communicated. The algorithm is implemented
as a sequence of rankk updates.
 If matrix A contains most data, it is left in place and
B and C are communicated. The algorithm is implemented
as a sequence of matrixpanel( of columns) multiplies.
 If matrix B contains most data, it is left in place and
A and C are communicated. The algorithm is implemented
as a sequence of panel( of rows)matrix multiplies.

Parameter checking:
PLA_Gemm_enter_exit.c
The best way to justify the Abstract Programming Interface used by
PLAPACK is to look at how a parallel implementation code looks
if it is coded in a more traditional fashion.
Look at the corresponding
ScaLAPACK code:

pdgemm_.c: ScaLAPACK Parallel BLAS (PBLAS) routine

pbdgemm.f:
ScaLAPACK Parallel Blocked BLAS (PBBLAS) routine
References:
 R. van de Geijn,
Using PLAPACK (Users' Guide) , The MIT Press, 1997.
 Robert van de Geijn and Jerrell Watts
"SUMMA: Scalable Universal Matrix Multiplication Algorithm,"
Concurrency: Practice and Experience, Vol. 9 (4), pp. 255274
(April 1997)
 John Gunnels, Calvin Lin, Greg Morrow, and Robert van de Geijn,
"A Flexible Class of Parallel Matrix Multiplication Algorithms"
, Proceedings of First Merged International Parallel Processing
Symposium and Symposium on Parallel and Distributed Processing (1998 IPPS/SPDP
'98), pp. 110116 1998.
PLA_Chol: Cholesky Factorization

Main routine:
PLA_Chol.c

Parameter checking:
PLA_Chol_enter_exit.c

A much simpler implementation, which really shows off how a PLAPACK
implementation is just a direct translation of how an algorithm is naturally
expressed, is given by
The best way to justify the Abstract Programming Interface used by
PLAPACK is to look at how a parallel implementation code looks
if it is coded in a more traditional fashion.
Look at the corresponding
ScaLAPACK code:

pdpotrf.c: ScaLAPACK Blocked Cholesky Factorization

pdpotf2.c: ScaLAPACK Unblocked Cholesky Factorization (needed by blocked factorization)
References:
 R. van de Geijn,
Using PLAPACK (Users' Guide) , The MIT Press, 1997.
 Greg Morrow and Robert van de Geijn,
"Zen and the Art of HighPerformance Parallel Computing"
 Greg Baker, John Gunnels, Greg Morrow, Beatrice Riviere, and Robert
van de Geijn,
"PLAPACK: High Performance through High Level Abstraction" ,
ICPP98.
 Philip Alpatov, Greg Baker, Carter Edwards, John Gunnels, Greg Morrow,
James Overfelt, Robert van de Geijn, YuanJye J. Wu,
"PLAPACK: Parallel Linear Algebra Libraries Design Overview"
, SC97.
 Philip Alpatov, Greg Baker, Carter Edwards, John Gunnels, Greg Morrow,
James Overfelt, Robert van de Geijn, YuanJye J. Wu, "PLAPACK:
Parallel Linear Algebra Package," in Proceedings of the SIAM
Parallel Processing Conference, 1997.
Back to PLAPACK page
Send mail to
plapack@cs.utexas.edu
Last Updated: Feb. 8, 2000