Unit 1.5.1 The Basic Linear Algebra Subprograms
¶Linear algebra operations are fundamental to computational science. In the 1970s, when vector supercomputers reigned supreme, it was recognized that if applications and software libraries are written in terms of a standardized interface to routines that implement operations with vectors, and vendors of computers provide highperformance instantiations for that interface, then applications would attain portable high performance across different computer platforms. This observation yielded the original Basic Linear Algebra Subprograms (BLAS) interface [8] for Fortran 77, which are now referred to as the level1 BLAS. The interface was expanded in the 1980s to encompass matrixvector operations (level2 BLAS) [3] and matrixmatrix operations (level3 BLAS) [2].
You should become familiar with the BLAS. Here are some resources:
An overview of the BLAS and how they are used to achieve portable high performance is given in the article [14]:
Robert van de Geijn and Kazushige Goto, BLAS (Basic Linear Algebra Subprograms), Encyclopedia of Parallel Computing, Part 2, pp. 157164, 2011. If you don't have access, you may want to read an advanced draft.
If you use the BLAS, you should cite one or more of the original papers (as well as the implementation that you use):
C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh, Basic Linear Algebra Subprograms for Fortran Usage, ACM Transactions on Mathematical Software, Vol. 5, No. 3, pp. 308323, Sept. 1979.
Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Richard J. Hanson, An Extended Set of {FORTRAN} Basic Linear Algebra Subprograms, ACM Transactions on Mathematical Software, Vol. 14, No. 1, pp. 117, March 1988.
Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Iain Duff, A Set of Level 3 Basic Linear Algebra Subprograms, ACM Transactions on Mathematical Software, Vol. 16, No. 1, pp. 117, March 1990.
A handy reference guide to the BLAS:
Basic Linear Algebra Subprograms: A Quick Reference Guide.
http://www.netlib.org/blas/blasqr.pdf
.
There are a number of implementations of the BLAS available for various architectures:

A reference implementation in Fortran is available from
http://www.netlib.org/blas/
.This is an unoptimized implementation: It provides the functionality without high performance.
The current recommended highperformance opensource implementation of the BLAS is provided by the BLASlike Library Instantiation Software (BLIS) discussed in Unit 1.5.2. The techniques you learn in this course underlie the implementation in BLIS.

Different vendors provide their own highperformance implementations:
Intel provides optimized BLAS as part of their Math Kernel Library (MKL):
https://software.intel.com/enus/mkl
.
AMD's opensource BLAS for their CPUs can be found at
https://developer.amd.com/amdcpulibraries/blaslibrary/
. Their implementation is based on BLIS.AMD also has a BLAS library for its GPUs:
https://github.com/ROCmSoftwarePlatform/rocBLAS
. Arm provides optimized BLAS as part of their Arm Performance Library
https://developer.arm.com/products/softwaredevelopmenttools/hpc/armperformancelibraries
.IBM provides optimized BLAS as part of their Engineering and Scientific Subroutine Library (ESSL):
https://www.ibm.com/support/knowledgecenter/en/SSFHY8/essl_welcome.html
.Cray provides optimized BLAS as part of their Cray Scientific Libraries (LibSci)
https://www.cray.com/sites/default/files/SBCrayProgrammingEnvironment.pdf
.For their GPU accelerators, NVIDIA provides the cuBLAS
https://developer.nvidia.com/cublas
.