libflame download page


Source code

libflame is provided as free software, licensed under the GNU Lesser General Public License (LGPL) in two forms:


Reference guide

We strongly encourage our users to refer to the latest copy of the libflame user's guide for installation instructions and API reference.


What is libflame?

FLAME is a methodology for developing dense linear algebra libraries that is radically different from the LINPACK/LAPACK approach that dates back to the 1970s. By libflame we denote the library that has resulted from this project. For addition information, visit the FLAME home page.


What's provided by libflame?

The following libflame features benefit both basic and advanced users, as well as library developers:


FLAME Chol

Figure 1: Blocked Cholesky Factorization (variant 2) expressed as a FLAME algorithm.


FLA_Error FLA_Chol_l_blk_var2( FLA_Obj A, int nb_alg )
{
  FLA_Obj ATL,   ATR,      A00, A01, A02,
          ABL,   ABR,      A10, A11, A12,
                           A20, A21, A22;
  int b, value = 0;

  FLA_Part_2x2( A,    &ATL, &ATR,
                      &ABL, &ABR,     0, 0, FLA_TL );

  while ( FLA_Obj_length( ATL ) < FLA_Obj_length( A ) ){

    b = min( FLA_Obj_length( ABR ), nb_alg );

    FLA_Repart_2x2_to_3x3( ATL, /**/ ATR,       &A00, /**/ &A01, &A02,
                        /* ************* */   /* ******************** */
                                                &A10, /**/ &A11, &A12,
                           ABL, /**/ ABR,       &A20, /**/ &A21, &A22,
                           b, b, FLA_BR );

    /* ---------------------------------------------------------------- */

    FLA_Syrk( FLA_LOWER_TRIANGULAR, FLA_NO_TRANSPOSE, 
              FLA_MINUS_ONE, A10, FLA_ONE, A11 );

    FLA_Gemm( FLA_NO_TRANSPOSE, FLA_TRANSPOSE, 
              FLA_MINUS_ONE, A20, A10, FLA_ONE, A21 );

    value = FLA_Chol_unb_external( FLA_LOWER_TRIANGULAR, A11 );

    if ( value != FLA_SUCCESS )
      return ( FLA_Obj_length( A00 ) + value );

    FLA_Trsm( FLA_RIGHT, FLA_LOWER_TRIANGULAR, 
              FLA_TRANSPOSE, FLA_NONUNIT_DIAG, 
              FLA_ONE, A11, A21 );

    /* ---------------------------------------------------------------- */

    FLA_Cont_with_3x3_to_2x2( &ATL, /**/ &ATR,       A00, A01, /**/ A02,
                                                     A10, A11, /**/ A12,
                            /* ************** */  /* ****************** */
                              &ABL, /**/ &ABR,       A20, A21, /**/ A22,
                              FLA_TL );
  }

  return value;
}
      SUBROUTINE DPOTRF( UPLO, N, A, LDA, INFO )

      CHARACTER          UPLO
      INTEGER            INFO, LDA, N
      DOUBLE PRECISION   A( LDA, * )

      DOUBLE PRECISION   ONE
      PARAMETER          ( ONE = 1.0D+0 )
      LOGICAL            UPPER
      INTEGER            J, JB, NB
      LOGICAL            LSAME
      INTEGER            ILAENV
      EXTERNAL           LSAME, ILAENV
      EXTERNAL           DGEMM, DPOTF2, DSYRK, DTRSM, XERBLA
      INTRINSIC          MAX, MIN

      INFO = 0
      UPPER = LSAME( UPLO, 'U' )
      IF( .NOT.UPPER .AND. .NOT.LSAME( UPLO, 'L' ) ) THEN
         INFO = -1
      ELSE IF( N.LT.0 ) THEN
         INFO = -2
      ELSE IF( LDA.LT.MAX( 1, N ) ) THEN
         INFO = -4
      END IF
      IF( INFO.NE.0 ) THEN
         CALL XERBLA( 'DPOTRF', -INFO )
         RETURN
      END IF

      INFO = 0
      UPPER = LSAME( UPLO, 'U' )

      IF( N.EQ.0 )
     $   RETURN

      NB = ILAENV( 1, 'DPOTRF', UPLO, N, -1, -1, -1 )
      IF( NB.LE.1 .OR. NB.GE.N ) THEN
         CALL DPOTF2( UPLO, N, A, LDA, INFO )
      ELSE
         IF( UPPER ) THEN    
*********** Upper triangular case omited for purposes of fair comparison.
         ELSE
            DO 20 J = 1, N, NB
               JB = MIN( NB, N-J+1 )
               CALL DSYRK( 'Lower', 'No transpose', JB, J-1, -ONE,
     $                     A( J, 1 ), LDA, ONE, A( J, J ), LDA )
               CALL DPOTF2( 'Lower', JB, A( J, J ), LDA, INFO )
               IF( INFO.NE.0 )
     $            GO TO 30
               IF( J+JB.LE.N ) THEN
                  CALL DGEMM( 'No transpose', 'Transpose', N-J-JB+1, JB,
     $                        J-1, -ONE, A( J+JB, 1 ), LDA, A( J, 1 ),
     $                        LDA, ONE, A( J+JB, J ), LDA )
                  CALL DTRSM( 'Right', 'Lower', 'Transpose', 'Non-unit',
     $                        N-J-JB+1, JB, ONE, A( J, J ), LDA,
     $                        A( J+JB, J ), LDA )
               END IF
   20       CONTINUE
         END IF
      END IF
      GO TO 40
   30 CONTINUE
      INFO = INFO + J - 1
   40 CONTINUE
      RETURN
      END

Figure 2: FLAME/C code for algorithm shown in Figure 1 (left) representing the style of coding found in libflame, and Fortran-77 LAPACK code (right) implementing the same algorithm.


chol_l_opteron

Figure 3: Cholesky Factorization implementations compared on an 8-core Opteron system. Notes: For FLAME experiments, LAPACK was used only for the small unblocked Cholesky subproblem. GotoBLAS was configured to provide multithreaded parallelism for level-3 BLAS operations. Peak system performance is 38.4 GFLOPS.


chol_l_itanium2

Figure 4: Cholesky Factorization implementations compared on a 16 core Itanium2 system. Notes: libflame uses variant 3 while LAPACK uses variant 2. For non-SuperMatrix experiments, GotoBLAS was configured to provide multithreaded parallelism for level-3 BLAS operations. For SuperMatrix experiments, GotoBLAS parallelism was disabled. Theoretical peak system performance is 96 GFLOPS.


What's new in libflame?

We've added lots of functionality since libflame 3.0 was released on May 1, 2009. Here is a basic summary:

Library API and implementations

Build system


Status of operation support

libflame contains implementations of many operations that are provided by the BLAS and LAPACK libraries. However, not all FLAME implemenations support every datatype. Also, in many cases, we use a different naming convention for our routine names. The following table summarizes which routines are supported within libflame and also provides their corresponding netlib name for reference.

Notes:


operation name
netlib routine name
libflame routine name
FLAME/C
FLASH
SuperMatrix
type support
l2f support
libflame routine prefix


FLA_
FLASH_*
FLASH_


Level-3 BLAS







general matrix-matrix multiply
?gemm
Gemm
y
y
y
sdcz
N/A
hermitian matrix-matrix multiply
?hemm
Hemm
y
y
y
sdcz
N/A
hermitian rank-k update
?herk
Herk
y
y
y
sdcz
N/A
hermitian rank-2k update
?her2k
Her2k
y
y
y
sdcz
N/A
symmetric matrix-matrix multiply
?symm
Symm
y
y
y
sdcz
N/A
symmetric rank-k update
?syrk
Syrk
y
y
y
sdcz
N/A
symmetrix rank-2k update
?syr2k
Syr2k
y
y
y
sdcz
N/A
triangular matrix-matrix multiply
?trmm
Trmm
y
y
y
sdcz
N/A
triangular solve with multiple right-hand sides
?trsm
Trsm
y
y
y
sdcz
N/A
LAPACK







Cholesky factorization
?potrf
Chol
y
y
y
sdcz
sdcz
LU factorization with no pivoting
~
LU_nopiv
y
y
y
sdcz
N/A
LU factorization with partial pivoting
?getrf
LU_piv
y
y
y
sdcz
sdcz
LU factorization with incremental pivoting
~
LU_incpiv

y
y
sdcz
N/A
QR factorization via the UT transform
~
QR_UT
y


sdcz
sdcz
QR factorization (incremental) via the UT transform
~
QR_UT_inc

y
y
sdcz
N/A
Apply Q from QR_UT or LQ_UT to a right-hand side matrix
~
Apply_Q_UT
y


sdcz
sdcz
Apply Q from QR_UT_inc to a right-hand side matrix
~
Apply_Q_UT_inc

y
y
sdcz
N/A
LQ factorization via the UT transform
~
LQ_UT
y


sdcz
sdcz
Trinagular matrix inversion
?trtri
Trinv
y
y
y
sdcz
sdcz
triangular transpose matrix-matrix multiply
?laaum
Ttmm
y
y
y
sdcz
sdcz
SPD matrix inversion
?dpotri +
SPDinv
y
y
y
sdcz
sdcz
Triangular Sylvester equation solve
?trsyl ^
Sylv
y
y
y
sdcz
sdcz


LAPACK compatibility support in libflame

We provide an interface, lapack2flame, which allows legacy codes that link to LAPACK to utilize libflame without any code changes. However, lapack2flame does not provide interfaces to all routines within LAPACK. The column labeled "l2f support" in the above table shows which datatypes are supported for each operation.


System and software requirements

Please see the libflame user's guide for the latest system requirements for both GNU/Linux and UNIX, and Windows platforms.


Building and Installing libflame

Please see the libflame user's guide for the latest instructions on downloading, configuring, compiling, and installing libflame.


Building and installing GotoBLAS

The developers of libflame enthusiastically encourage users to use the GotoBLAS implementation of the Basic Linear Algebra Subprograms (BLAS). To obtain the source code for GotoBLAS, please visit the Texas Advanced Computing Center software site. After downloading perform the following steps:

  1. tar xzf GotoBLAS-1.22.tar.gz

  2. cd GotoBLAS

  3. Please read the documentation that accompanies the GotoBLAS source.

  4. Most users may build the GotoBLAS library by running quickbuild.32bit or quickbuild.64bit. Alternately, advanced users may instead view and edit Makefile.rule and then execute:

    make lib

  5. Copy the library archive to a more permanent directory. You should also symbolically link the libgoto library to an abbreviated name:

    ln -s libgoto_ITANIUM2-r1.10.a libgoto.a

  6. If multiple architecture builds of libgoto share the same directory, then you should include an architecture substring in the symbolic link name to differentiate the builds:

    ln -s libgoto_ITANIUM2-r1.10.a libgoto_ia64.a

We highly recommend using libflame with GotoBLAS! However, libflame will work with any BLAS library. If you want to use libflame with a different BLAS, use the configure-time option --disable-goto-interfaces before building libflame. If you have further questions about interfacing libflame with your preferred BLAS library, contact flame@cs.utexas.edu.


Linking your LAPACK dependent application to libflame

Please see the libflame user's guide for the latest instructions on linking your legacy, LAPACK-dependent application to libflame.


Running an Example

We offer a step-by-step walkthrough for running two example programs included in the libflame source distribution: the first executes a sequential Cholesky factorization with conventional ("flat") matrix storage; the second executes a multithreaded Cholesky factorization using SuperMatrix and hierarchical storage.

We also encourage potential users to browse the code examples provided at our linear algebra wiki.


Beyond LAPACK

We have functionality beyond LAPACK. For example, we have routines for updating an LU factorization with pivoting. Adding additional operations is not our top priority at the moment. However, if you have an operation that you would like to see supported, it doesn't hurt to contact us with your request!


Thank us!

We are very insecure people. So, if you like the libraries and find them useful, send us a message! We even make it easy. In the top-level directory of the libflame distribution, execute:

make send-thanks

This will automatically e-mail us a message!


Questions?

Contact flame@cs.utexas.edu.


Last Updated on 29 September 2009 by Field G. Van Zee.