SCALABILITY ISSUES AFFECTING THE DESIGN OF
A DENSE LINEAR ALGEBRA LIBRARY
- Jack J. Dongarra
- Department of Computer Science
- University of Tennessee
- 107 Ayres Hall
- Knoxville, TN 37996-1301
- and
- Mathematical Sciences Section
- Oak Ridge National Laboratory
- P. O. Box 2008, Bldg. 6012
- Oak Ridge, TN 37831-6367
- Robert A. van de Geijn
- Department of Computer Sciences
- University of Texas
- Austin, TX 78712
- David W. Walker
- Mathematical Sciences Section
- Oak Ridge National Laboratory
- P. O. Box 2008, Bldg. 6012
- Oak Ridge, TN 37831-6367
Abstract
This paper discusses the scalability of Cholesky, LU, and QR factorization
routines on MIMD distributed memory concurrent computers. These routines
form part of the ScaLAPACK mathematical software library that extends the
widely-used LAPACK library to run efficiently on scalable concurrent computers.
To ensure good scalability and performance, the ScaLAPACK routines are based
on block-partitioned algorithms that reduce the frequency of data movement
between different levels of the memory hierarchy, and particularly between
processors. The block cyclic data distribution, that is used in all three
factorization algorithms, is described. An outline of the sequential and
parallel block-partitioned algorithms is given. Approximate models of
algorithms' performance are presented to indicate which factors in the design
of the algorithm have an impact upon scalability. These models are
compared with timings results on a 128-node Intel iPSC/860 hypercube. It
is shown that the routines are highly scalable
on this machine for problems that occupy more than about 25\% of the memory
on each processor, and that the measured timings are consistent with the
performance model.
The contribution of this paper goes beyond reporting our
experience: our implementations are available in the public domain.
Jack Dongarra, Robert van de Geijn, and David Walker, ``Scalability Issues
Affecting the Design of a Dense Linear Algebra Library,'' Special Issue on
Scalability of Parallel Algorithms, Journal of Parallel and Distributed
Computing , Vol. 22, No. 3, Sept. 1994.