libFLAME Release Notes


Source code

libFLAME is provided as free software, licensed under the GNU Lesser General Public License (LGPL) in two forms:


What is libFLAME?

FLAME is a methodology for developing dense linear algebra libraries that is radically different from the LINPACK/LAPACK approach that dates back to the 1970s. By libFLAME we denote the library that has resulted from this project. For addition information, visit the FLAME home page.


What's provided by libFLAME?

The following libFLAME features benefit both basic and advanced users, as well as library developers:


FLAME Chol

Figure 1: Blocked Cholesky Factorization (variant 2) expressed as a FLAME algorithm.


FLA_Error FLA_Chol_l_blk_var2( FLA_Obj A, int nb_alg )
{
  FLA_Obj ATL,   ATR,      A00, A01, A02,
          ABL,   ABR,      A10, A11, A12,
                           A20, A21, A22;
  int b, value = 0;

  FLA_Part_2x2( A,    &ATL, &ATR,
                      &ABL, &ABR,     0, 0, FLA_TL );

  while ( FLA_Obj_length( ATL ) < FLA_Obj_length( A ) ){

    b = min( FLA_Obj_length( ABR ), nb_alg );

    FLA_Repart_2x2_to_3x3( ATL, /**/ ATR,       &A00, /**/ &A01, &A02,
                        /* ************* */   /* ******************** */
                                                &A10, /**/ &A11, &A12,
                           ABL, /**/ ABR,       &A20, /**/ &A21, &A22,
                           b, b, FLA_BR );

    /* ---------------------------------------------------------------- */

    FLA_Syrk( FLA_LOWER_TRIANGULAR, FLA_NO_TRANSPOSE, 
              FLA_MINUS_ONE, A10, FLA_ONE, A11 );

    FLA_Gemm( FLA_NO_TRANSPOSE, FLA_TRANSPOSE, 
              FLA_MINUS_ONE, A20, A10, FLA_ONE, A21 );

    value = FLA_Chol_unb_external( FLA_LOWER_TRIANGULAR, A11 );

    if ( value != FLA_SUCCESS )
      return ( FLA_Obj_length( A00 ) + value );

    FLA_Trsm( FLA_RIGHT, FLA_LOWER_TRIANGULAR, 
              FLA_TRANSPOSE, FLA_NONUNIT_DIAG, 
              FLA_ONE, A11, A21 );

    /* ---------------------------------------------------------------- */

    FLA_Cont_with_3x3_to_2x2( &ATL, /**/ &ATR,       A00, A01, /**/ A02,
                                                     A10, A11, /**/ A12,
                            /* ************** */  /* ****************** */
                              &ABL, /**/ &ABR,       A20, A21, /**/ A22,
                              FLA_TL );
  }

  return value;
}
      SUBROUTINE DPOTRF( UPLO, N, A, LDA, INFO )

      CHARACTER          UPLO
      INTEGER            INFO, LDA, N
      DOUBLE PRECISION   A( LDA, * )

      DOUBLE PRECISION   ONE
      PARAMETER          ( ONE = 1.0D+0 )
      LOGICAL            UPPER
      INTEGER            J, JB, NB
      LOGICAL            LSAME
      INTEGER            ILAENV
      EXTERNAL           LSAME, ILAENV
      EXTERNAL           DGEMM, DPOTF2, DSYRK, DTRSM, XERBLA
      INTRINSIC          MAX, MIN

      INFO = 0
      UPPER = LSAME( UPLO, 'U' )
      IF( .NOT.UPPER .AND. .NOT.LSAME( UPLO, 'L' ) ) THEN
         INFO = -1
      ELSE IF( N.LT.0 ) THEN
         INFO = -2
      ELSE IF( LDA.LT.MAX( 1, N ) ) THEN
         INFO = -4
      END IF
      IF( INFO.NE.0 ) THEN
         CALL XERBLA( 'DPOTRF', -INFO )
         RETURN
      END IF

      INFO = 0
      UPPER = LSAME( UPLO, 'U' )

      IF( N.EQ.0 )
     $   RETURN

      NB = ILAENV( 1, 'DPOTRF', UPLO, N, -1, -1, -1 )
      IF( NB.LE.1 .OR. NB.GE.N ) THEN
         CALL DPOTF2( UPLO, N, A, LDA, INFO )
      ELSE
         IF( UPPER ) THEN    
*********** Upper triangular case omited for purposes of fair comparison.
         ELSE
            DO 20 J = 1, N, NB
               JB = MIN( NB, N-J+1 )
               CALL DSYRK( 'Lower', 'No transpose', JB, J-1, -ONE,
     $                     A( J, 1 ), LDA, ONE, A( J, J ), LDA )
               CALL DPOTF2( 'Lower', JB, A( J, J ), LDA, INFO )
               IF( INFO.NE.0 )
     $            GO TO 30
               IF( J+JB.LE.N ) THEN
                  CALL DGEMM( 'No transpose', 'Transpose', N-J-JB+1, JB,
     $                        J-1, -ONE, A( J+JB, 1 ), LDA, A( J, 1 ),
     $                        LDA, ONE, A( J+JB, J ), LDA )
                  CALL DTRSM( 'Right', 'Lower', 'Transpose', 'Non-unit',
     $                        N-J-JB+1, JB, ONE, A( J, J ), LDA,
     $                        A( J+JB, J ), LDA )
               END IF
   20       CONTINUE
         END IF
      END IF
      GO TO 40
   30 CONTINUE
      INFO = INFO + J - 1
   40 CONTINUE
      RETURN
      END

Figure 2: FLAME/C code for algorithm shown in Figure 2 (left), representing the style of coding found in libFLAME, and Fortran-77 LAPACK code (right) implementing the same algorithm.


chol_l_opteron

Figure 3: Cholesky Factorization implementations compared on an 8-core Opteron system. Notes: For FLAME experiments, LAPACK was used only for the small unblocked Cholesky subproblem. GotoBLAS was configured to provide multithreaded parallelism for level-3 BLAS operations. Peak system performance is 38.4 GFLOPS.


chol_l_itanium2

Figure 4: Cholesky Factorization implementations compared on a 16 core Itanium2 system. Notes: libFLAME uses variant 3 while LAPACK uses variant 2. For non-SuperMatrix experiments, GotoBLAS was configured to provide multithreaded parallelism for level-3 BLAS operations. For SuperMatrix experiments, GotoBLAS parallelism was disabled. Theoretical peak system performance is 96 GFLOPS.


What's new in libFLAME 2.0?

We've added lots of functionality since libFLAME 1.0 was released on April 1, 2007. Here is a basic summary:

Library API and implementations

Build system


Status of operation support

libFLAME contains implementations of many operations that are provided by the BLAS and LAPACK libraries. However, not all FLAME implemenations support every datatype. Also, in many cases, we use a different naming convention for our routine names. The following table summarizes which routines are supported within libFLAME and also provides their corresponding netlib name for reference.

Notes:


operation name
netlib routine name
libFLAME routine name
FLAME/C
FLASH
SuperMatrix
type support
l2f support
libFLAME routine prefix


FLA_
FLASH_*
FLASH_


Level-3 BLAS







general matrix-matrix multiply
?gemm
Gemm
y
y
y
sdcz
N/A
hermitian matrix-matrix multiply
?hemm
Hemm
y
y
y
sdcz
N/A
hermitian rank-k update
?herk
Herk
y
y
y
sdcz
N/A
hermitian rank-2k update
?her2k
Her2k
y
y
y
sdcz
N/A
symmetric matrix-matrix multiply
?symm
Symm
y
y
y
sdcz
N/A
symmetric rank-k update
?syrk
Syrk
y
y
y
sdcz
N/A
symmetrix rank-2k update
?syr2k
Syr2k
y
y
y
sdcz
N/A
triangular matrix-matrix multiply
?trmm
Trmm
y
y
y
sdcz
N/A
triangular solve with multiple right-hand sides
?trsm
Trsm
y
y
y
sdcz
N/A
LAPACK







triangular transpose matrix-matrix multiply
?laaum
Ttmm
y
y
y
sdcz
sdcz
Cholesky factorization
?potrf
Chol
y
y
y
sdcz
sdcz
LU factorization with no pivoting
~
LU_nopiv
y
y
y
sdcz
sdcz
LU factorization with partial pivoting
?getrf
LU_piv
y


sdcz
sdcz
QR factorization
?geqrf
QR
y


sd
d
QR factorization via the UT transform
~
QR_UT
y


sd
d
LQ factorization
?gelqf
LQ
y


sd
d
LQ factorization via the UT transform
~
LQ_UT
y


sd
d
Reduction to upper Hessenberg form
?gehrd
Hess
y


d
d
Trinagular matrix inversion
?trtri
Trinv
y
y
y
sdcz
sdcz
SPD matrix inversion
?dpotri +
SPDinv
y
y
y
sdcz
sdcz
Triangular Sylvester equation solve
?trsyl ^
Sylv
y
y
y
sdcz
sdcz


LAPACK compatibility support in libFLAME

We provide an interface, liblapack2flame, which allows legacy codes that link to LAPACK to utilize libFLAME without any code changes. However, liblapack2flame does not provide interfaces to all routines within LAPACK. The column labeled "l2f support" in the above table shows which datatypes are supported for each operation.

In addition, liblapack2flame provides some interfaces to some routines which are dependent upon the above operations. An incomplete list of these operations is:

dgees, dgeesx, dgeev, dgeevx, dggev, dggevx, dgelq2, dgeqp3, dgeqr2, dggqrf, dggrqf, dgesdd, dgesvd, dposv, dposvx, dsygvd, dsygv, dsygvx, dgegs, dgegv, dgges, dggesx, dggglm, dgglse, dgelsy, dgelsd, dgelss


System and software requirements

Before you attempt to build libFLAME, be sure you have the following software tools:

Over time, libFLAME has been tested on a wide swath of modern architectures, including but not limited to x86 (Pentium/Athlon family), ia64 (Itanium family), x86_64 (Opteron/EM64T), and POWER4/5. Support by an architecture is primarily determined by the presence of an appropriate compiler. The configure script will attempt to find an appropriate compiler for a given architecure according to a predetermined search order for that architecture. For example, The first C compiler searched for on an Itanium2 system is Intel's icc. If icc is not found, then the search continues for GNU gcc. If gcc is not present, then the script checks for a generic compiler named cc. It is also possible for the user to specify the compiler explicitly at configure-time. Please see ./configure --help for further information on this and other related topics.


Building and Installing libFLAME

After downloading the software, you may proceed to build and install the libraries by performing the following steps. (Note here we assume you're building from a libflame 2.0 tarball.)

  1. tar xzf libflame-2.0.tar.gz

  2. cd libflame-2.0

  3. Configure the library. Please run ./configure --help for the full range configure options.

    ./configure --prefix=<install_prefix>

    Alternatively, you may edit and run the configure wrapper in run-conf/run-configure.sh. Note that specifying the install prefix is optional. If it is omitted, the default is $HOME/flame (which we generally recommend).

  4. Compile the source code.

    make -j n

    The -j option is optional. When building libFLAME on an SMP or multicore system, you may effectively parallelize the compilation process by specifying an argument n greater than 1. In this case, make spawns n processes, allowing it to compile up to n files simultaneously.

  5. Install the library archive files to <install_prefix> ($HOME/flame by default).

    make install

At this point, the libFLAME libraries have been installed into the lib subdirectory of <install_prefix>. We recommend symbolically linking the libraries to abbreviated names that do not contain the version. In addition, you might also omit the architecture from the symbolic link name if you will only be linking code for one architecture. This can be done manually, or with the help of some optional post-installation make targets. Execute

make install-symlinks

to create symbolic links that omit both version and architecture strings from the symbolic link name, or

make install-symlinks-with-arch

to create links that omit the version but contain an architecture string. This allows one to distinguish among libraries compiled for different architectures.

In your application's makefile, refer to the symbolic link. When it comes time to install an updated version of libFLAME, you need only update the symbolic links (ie: execute make install-symlinks) to the FLAME libraries rather than the makefiles of the programs that reference them.

Configure options

If you are interested in configuring libFLAME with non-default options, please see the output of configure --help. We've summarized the most commonly used configure options here:

option
description
default
--enable-optimizations
Employ traditional compiler optimizations when compiling C and Fortran source code.
Enabled
--enable-warnings
Use the appropriate flag(s) to request warnings when compiling C and Fortran source code.
Enabled
--enable-debug
Use the appropriate debug flag (usually -g) when compiling C and Fortran source code.
Disabled
--enable-builtin-lapack-routines
Build and include into libFLAME blocked and unblocked LAPACK routines for all operations supported within libFLAME. When this option is disabled, LAPACK is required at link-time. Note that FLAME implementations of LAPACK operations (such as Cholesky, LU, and QR Factorizations) only use LAPACK code for their unblocked subproblems, though libFLAME also includes wrappers to external blocked implementations for reference testing. Enabling this option is useful when a user is setting up libFLAME for the first time and does not want to build LAPACK from source and has no intention of using a third-party library, such as MKL, to provide basic LAPACK functionality.
Disabled
--enable-goto-interfaces
Enable code that interfaces with internal/low-level libgoto functionality, such as those symbols that may be queried for architecture-dependent blocksize values.
Enabled
--enable-supermatrix
Enable Ernie Chan's dependency-aware task scheduling and parallel execution system.
Disabled
--enable-multithreading=model
Enable multithreading support. Valid values for model are pthreads and openmp. Threading must be enabled to access SMP/multicore parallelized implementations.
Disabled
--enable-memory-alignment=N
Enable code that aligns dynamically allocated memory regions at N-byte boundaries. Note: N must be a power of two and multiple of sizeof(void*), which is usually 4 on 32-bit architectures and 8 on 64-bit architectures.
Disabled
--enable-internal-error-checking
Enable internal runtime consistency checks of function parameters and return values.
Enabled
--enable-memory-counter
Enable code that keeps track of the balance between calls to FLA_malloc() and FLA_free(). Upon calling FLA_Finalize(), the counter value is output to standard error.
Disabled


Building and installing GotoBLAS

The developers of libFLAME enthusiastically encourage users to use the GotoBLAS implementation of the Basic Linear Algebra Subprograms (BLAS). To obtain the source code for GotoBLAS, please visit the Texas Advanced Computing Center software site. After downloading perform the following steps:

  1. tar xzf GotoBLAS-1.22.tar.gz

  2. cd GotoBLAS

  3. Please read the documentation that accompanies the GotoBLAS source.

  4. Most users may build the GotoBLAS library by running quickbuild.32bit or quickbuild.64bit. Alternately, advanced users may instead view and edit Makefile.rule and then execute:

    make lib

  5. Copy the library archive to a more permanent directory. You should also symbolically link the libgoto library to an abbreviated name:

    ln -s libgoto_ITANIUM2-r1.10.a libgoto.a

  6. If multiple architecture builds of libgoto share the same directory, then you should include an architecture substring in the symbolic link name to differentiate the builds:

    ln -s libgoto_ITANIUM2-r1.10.a libgoto_ia64.a

We highly recommend using libFLAME with GotoBLAS! However, libFLAME will work with any BLAS library. If you want to use libFLAME with a different BLAS, use the configure-time option --disable-goto-interfaces before building libFLAME. If you have further questions about interfacing libFLAME with your preferred BLAS library, contact flame@cs.utexas.edu.


Linking your LAPACK dependent application to libFLAME

  1. Develop your algorithm with your favorite implementation of LAPACK. Let's assume that you compile and link your code via

    gfortran ... -L<lapack_path> -L<blas_path> -llapack -lblas
    where -llapack links the standard LAPACK library and -lblas links your favorite BLAS library, located in <lapack_path> and <blas_path>, respectively.

  2. Once your code works correctly, link to the GotoBLAS library instead. (The GotoBLAS often provides the fastest level-3 BLAS routines available.)

    gfortran ... -L<lapack_path> -L<goto_path> -llapack -lgoto
    where <goto_path> is the directory in which you keep the GotoBLAS library archive and its symbolic link.

  3. Now it is time to experiment with linking to the libFLAME libraries:

    gfortran ... -L<lapack_path> -L<goto_path> -L<flame_path> -llapack2flame -lflame -llapack -lgoto
    where <flame_path> is the directory <install_prefix>/lib. Recall that <install_prefix> was determined when you configured FLAME for compiling. If you did not specify <install_prefix>, then the default value is used ($HOME/flame/lib).

  4. Run your code with the new FLAME implementations of the supported LAPACK routines listed above.

Running an Example

We offer a step-by-step walkthrough for running two example programs included in the libflame source distribution: the first executes a sequential Cholesky factorization with conventional ("flat") matrix storage; the second executes a multithreaded Cholesky factorization using SuperMatrix and hierarchical storage.

We also encourage potential users to browse the code examples provided at our linear algebra wiki.


Beyond LAPACK

We have functionality beyond LAPACK. For example, we have routines for updating an LU factorization with pivoting. Adding additional operations is not our top priority at the moment. However, if you have an operation that you would like to see supported, it doesn't hurt to contact us with your request!


Thank us!

We are very insecure people. So, if you like the libraries and find them useful, send us a message! We even make it easy. In the top-level directory of the libFLAME distribution, execute:

make send-thanks

This will automatically e-mail us a message!


Questions?

Contact flame@cs.utexas.edu.


Last Updated on 6 August 2008 by Field G. Van Zee.