libflame is provided as free software, licensed under the GNU Lesser General Public License (LGPL) in two forms:
We strongly encourage our users to refer to the latest copy of the libflame user's guide for installation instructions and API reference.
FLAME is a methodology for developing dense linear algebra libraries that is radically different from the LINPACK/LAPACK approach that dates back to the 1970s. By libflame we denote the library that has resulted from this project. For addition information, visit the FLAME home page.
The following libflame features benefit both basic and advanced users, as well as library developers:
Figure 1: Blocked Cholesky Factorization (variant 2) expressed as a FLAME algorithm.
FLA_Error FLA_Chol_l_blk_var2( FLA_Obj A, int nb_alg ) { FLA_Obj ATL, ATR, A00, A01, A02, ABL, ABR, A10, A11, A12, A20, A21, A22; int b, value = 0; FLA_Part_2x2( A, &ATL, &ATR, &ABL, &ABR, 0, 0, FLA_TL ); while ( FLA_Obj_length( ATL ) < FLA_Obj_length( A ) ){ b = min( FLA_Obj_length( ABR ), nb_alg ); FLA_Repart_2x2_to_3x3( ATL, /**/ ATR, &A00, /**/ &A01, &A02, /* ************* */ /* ******************** */ &A10, /**/ &A11, &A12, ABL, /**/ ABR, &A20, /**/ &A21, &A22, b, b, FLA_BR ); /*  */ FLA_Syrk( FLA_LOWER_TRIANGULAR, FLA_NO_TRANSPOSE, FLA_MINUS_ONE, A10, FLA_ONE, A11 ); FLA_Gemm( FLA_NO_TRANSPOSE, FLA_TRANSPOSE, FLA_MINUS_ONE, A20, A10, FLA_ONE, A21 ); value = FLA_Chol_unb_external( FLA_LOWER_TRIANGULAR, A11 ); if ( value != FLA_SUCCESS ) return ( FLA_Obj_length( A00 ) + value ); FLA_Trsm( FLA_RIGHT, FLA_LOWER_TRIANGULAR, FLA_TRANSPOSE, FLA_NONUNIT_DIAG, FLA_ONE, A11, A21 ); /*  */ FLA_Cont_with_3x3_to_2x2( &ATL, /**/ &ATR, A00, A01, /**/ A02, A10, A11, /**/ A12, /* ************** */ /* ****************** */ &ABL, /**/ &ABR, A20, A21, /**/ A22, FLA_TL ); } return value; } 
SUBROUTINE DPOTRF( UPLO, N, A, LDA, INFO ) CHARACTER UPLO INTEGER INFO, LDA, N DOUBLE PRECISION A( LDA, * ) DOUBLE PRECISION ONE PARAMETER ( ONE = 1.0D+0 ) LOGICAL UPPER INTEGER J, JB, NB LOGICAL LSAME INTEGER ILAENV EXTERNAL LSAME, ILAENV EXTERNAL DGEMM, DPOTF2, DSYRK, DTRSM, XERBLA INTRINSIC MAX, MIN INFO = 0 UPPER = LSAME( UPLO, 'U' ) IF( .NOT.UPPER .AND. .NOT.LSAME( UPLO, 'L' ) ) THEN INFO = 1 ELSE IF( N.LT.0 ) THEN INFO = 2 ELSE IF( LDA.LT.MAX( 1, N ) ) THEN INFO = 4 END IF IF( INFO.NE.0 ) THEN CALL XERBLA( 'DPOTRF', INFO ) RETURN END IF INFO = 0 UPPER = LSAME( UPLO, 'U' ) IF( N.EQ.0 ) $ RETURN NB = ILAENV( 1, 'DPOTRF', UPLO, N, 1, 1, 1 ) IF( NB.LE.1 .OR. NB.GE.N ) THEN CALL DPOTF2( UPLO, N, A, LDA, INFO ) ELSE IF( UPPER ) THEN *********** Upper triangular case omited for purposes of fair comparison. ELSE DO 20 J = 1, N, NB JB = MIN( NB, NJ+1 ) CALL DSYRK( 'Lower', 'No transpose', JB, J1, ONE, $ A( J, 1 ), LDA, ONE, A( J, J ), LDA ) CALL DPOTF2( 'Lower', JB, A( J, J ), LDA, INFO ) IF( INFO.NE.0 ) $ GO TO 30 IF( J+JB.LE.N ) THEN CALL DGEMM( 'No transpose', 'Transpose', NJJB+1, JB, $ J1, ONE, A( J+JB, 1 ), LDA, A( J, 1 ), $ LDA, ONE, A( J+JB, J ), LDA ) CALL DTRSM( 'Right', 'Lower', 'Transpose', 'Nonunit', $ NJJB+1, JB, ONE, A( J, J ), LDA, $ A( J+JB, J ), LDA ) END IF 20 CONTINUE END IF END IF GO TO 40 30 CONTINUE INFO = INFO + J  1 40 CONTINUE RETURN END 
Figure 2: FLAME/C code for algorithm shown in Figure 1 (left) representing the style of coding found in libflame, and Fortran77 LAPACK code (right) implementing the same algorithm.
Figure 3: Cholesky Factorization implementations compared on an 8core Opteron system. Notes: For FLAME experiments, LAPACK was used only for the small unblocked Cholesky subproblem. GotoBLAS was configured to provide multithreaded parallelism for level3 BLAS operations. Peak system performance is 38.4 GFLOPS.
Figure 4: Cholesky Factorization implementations compared on a 16 core Itanium2 system. Notes: libflame uses variant 3 while LAPACK uses variant 2. For nonSuperMatrix experiments, GotoBLAS was configured to provide multithreaded parallelism for level3 BLAS operations. For SuperMatrix experiments, GotoBLAS parallelism was disabled. Theoretical peak system performance is 96 GFLOPS.
libflame contains implementations of many operations that are provided by the BLAS and LAPACK libraries. However, not all FLAME implemenations support every datatype. Also, in many cases, we use a different naming convention for our routine names. The following table summarizes which routines are supported within libflame and also provides their corresponding netlib name for reference.
Notes:
operation name 
netlib routine name 
libflame routine name 
FLAME/C 
FLASH 
SuperMatrix 
type support 
l2f support 

libflame routine prefix 
FLA_ 
FLASH_* 
FLASH_ 

Level3 BLAS 

general matrixmatrix multiply 
?gemm 
Gemm 
y 
y 
y 
sdcz 
N/A 
hermitian matrixmatrix multiply 
?hemm 
Hemm 
y 
y 
y 
sdcz 
N/A 
hermitian rankk update 
?herk 
Herk 
y 
y 
y 
sdcz 
N/A 
hermitian rank2k update 
?her2k 
Her2k 
y 
y 
y 
sdcz 
N/A 
symmetric matrixmatrix multiply 
?symm 
Symm 
y 
y 
y 
sdcz 
N/A 
symmetric rankk update 
?syrk 
Syrk 
y 
y 
y 
sdcz 
N/A 
symmetrix rank2k update 
?syr2k 
Syr2k 
y 
y 
y 
sdcz 
N/A 
triangular matrixmatrix multiply 
?trmm 
Trmm 
y 
y 
y 
sdcz 
N/A 
triangular solve with multiple righthand sides 
?trsm 
Trsm 
y 
y 
y 
sdcz 
N/A 
LAPACK 

Cholesky factorization 
?potrf 
Chol 
y 
y 
y 
sdcz 
sdcz 
LU factorization with no pivoting 
~ 
LU_nopiv 
y 
y 
y 
sdcz 
N/A 
LU factorization with partial pivoting 
?getrf 
LU_piv 
y 
y 
y 
sdcz 
sdcz 
LU factorization with incremental pivoting 
~ 
LU_incpiv 
y 
y 
sdcz 
N/A 

QR factorization via the UT transform 
~ 
QR_UT 
y 
sdcz 
sdcz 

QR factorization (incremental) via the UT transform 
~ 
QR_UT_inc 
y 
y 
sdcz 
N/A 

Apply Q from QR_UT or LQ_UT to a righthand side matrix 
~ 
Apply_Q_UT 
y 
sdcz 
sdcz 

Apply Q from QR_UT_inc to a righthand side matrix 
~ 
Apply_Q_UT_inc 
y 
y 
sdcz 
N/A 

LQ factorization via the UT transform 
~ 
LQ_UT 
y 
sdcz 
sdcz 

Trinagular matrix inversion 
?trtri 
Trinv 
y 
y 
y 
sdcz 
sdcz 
triangular transpose matrixmatrix multiply 
?laaum 
Ttmm 
y 
y 
y 
sdcz 
sdcz 
SPD matrix inversion 
?dpotri + 
SPDinv 
y 
y 
y 
sdcz 
sdcz 
Triangular Sylvester equation solve 
?trsyl ^ 
Sylv 
y 
y 
y 
sdcz 
sdcz 
We provide an interface, lapack2flame, which allows legacy codes that link to LAPACK to utilize libflame without any code changes. However, lapack2flame does not provide interfaces to all routines within LAPACK. The column labeled "l2f support" in the above table shows which datatypes are supported for each operation.
Please see the libflame user's guide for the latest system requirements for both GNU/Linux and UNIX, and Windows platforms.
Please see the libflame user's guide for the latest instructions on downloading, configuring, compiling, and installing libflame.
The developers of libflame enthusiastically encourage users to use the GotoBLAS implementation of the Basic Linear Algebra Subprograms (BLAS). To obtain the source code for GotoBLAS, please visit the Texas Advanced Computing Center software site. After downloading perform the following steps:
tar xzf GotoBLAS1.22.tar.gz
cd GotoBLAS
Please read the documentation that accompanies the GotoBLAS source.
Most users may build the GotoBLAS library by running quickbuild.32bit or quickbuild.64bit. Alternately, advanced users may instead view and edit Makefile.rule and then execute:
make lib
ln s libgoto_ITANIUM2r1.10.a libgoto.a
ln s libgoto_ITANIUM2r1.10.a libgoto_ia64.a
We highly recommend using libflame with GotoBLAS! However, libflame will work with any BLAS library. If you want to use libflame with a different BLAS, use the configuretime option disablegotointerfaces before building libflame. If you have further questions about interfacing libflame with your preferred BLAS library, contact flame@cs.utexas.edu.
Please see the libflame user's guide for the latest instructions on linking your legacy, LAPACKdependent application to libflame.
We offer a stepbystep walkthrough for running two example programs included in the libflame source distribution: the first executes a sequential Cholesky factorization with conventional ("flat") matrix storage; the second executes a multithreaded Cholesky factorization using SuperMatrix and hierarchical storage.
We also encourage potential users to browse the code examples provided at our linear algebra wiki.
We have functionality beyond LAPACK. For example, we have routines for updating an LU factorization with pivoting. Adding additional operations is not our top priority at the moment. However, if you have an operation that you would like to see supported, it doesn't hurt to contact us with your request!
We are very insecure people. So, if you like the libraries and find them useful, send us a message! We even make it easy. In the toplevel directory of the libflame distribution, execute:
make sendthanks
This will automatically email us a message!
Contact flame@cs.utexas.edu.
Last Updated on 29 September 2009 by Field G. Van Zee.