ITXGEMM

AT LAST!

John Gunnels (UT-Austin)
Greg Henry (Intel)
Robert van de Geijn (UT-Austin)

What's New
Release R1.1 for Intel Pentium (R) III

To be kept informed sign the FLAME guest book

Overview
About the current implementation
Obtaining the library
Performance
Publications
Related Projects
Commonly asked questions
Get on the FLAME mailing list
Known users
Future directions
Disclaimer

Overview

ITXGEMM is an implementation of matrix-matrix multiplication that builds on some recent theoretical results of ours that show how to take advantage of all layers of memory hierarchies on modern microprocessors. The project is a collaboration between Greg Henry at Intel (R), and John Gunnels and Robert van de Geijn at The University of Texas at Austin.

About the current implementation:

It targets specifically the Intel Pentium (TM) III processor for now.
Don't expect good performance on a Pentium (TM) II or Intel Celeron (TM) processor yet.
It is for the Linux operating system. Don't try to use it under Windows (TM) yet.
It only supports double precision real (64-bit) arithmetic for now..
It is NOT threat-safe yet.
It comes with absolutely no guarantees yet.

Obtaining the library

If you would like to try out this implementation please go through the following steps:

Get the auxiliary routines from Greg's web site: http://www.cs.utk.edu/~ghenry/ITXGEMM/. These are assembly coded routines that implement a matrix-matrix multiplication of submatrices that are staged to take maximal advantage of the L1 cache. Note: you need libITXauxR1.0PIII.a for ITXGEMM release R1.0
Get our kernels that stage the computation to take full advantage of the L2 and L1 by clicking here.
Add the following libraries when you link your code:
If you want to get our faster dgemm kernel, but you want to link other BLAS routines from another library use the following order:

libITXGEMMR1.0PIII.a OtherLibrary.a libITXauxR1.0PIII.a
Specifically,
- If you want to link Greg's complete library of BLAS for Linux, link
  
  libITXGEMMR1.0PIII.a sblas13d.a libITXauxR1.0PIII.a
- If you would like to link ATLAS:
  
  libITXGEMMR0.1PIII.a libatlas.a libITXauxR1.0PIII.a

Please

Do not redistribute the library.
Point others to this web page or Greg's web page instead.
Reference this work when you use it successfully for your own research.

Performance

How to do your own performance evaluation.
Note: ATLAS has implementations of some LAPACK routines as part of the library (e.g. dgetrf). Thus, to do a fair comparison between ATLAS and ITXGEMM, you will need to order the libaries upon linking as follows:
- liblapack.a libITXGEMMR1.0PIII.a libatlas.a libITXauxR1.0PIII.a This will force a routine like dgetrf to be taken from lapack and then linked with the ITXGEMM matrix-matrix multiply. After timing this, you should then link only with
- liblapack.a libatlas.a in that order and time to see how well the same LAPACK routine does with the ATLAS matrix-matrix multiply. Finally link with
- libatlas.a liblapack.a in that order and evaluate the ATLAS dgetrf routine.
- Notice that we have an optimized version of dgetrf that is faster than either ATLAS or LAPACK, but it is not yet part of the ITXGEMM release.
Naturally, ITXGEMM is fast.
Possibly the fastest by some measure. In particular, test performance for odd-sized matrices.
Test if it is fast for your application and let us know!
Next time someone promotes another package, ask for performance comparisons with ITXGEMM!
Performance results from the paper presented at ICCS01

Related Publications

For related publications, see the FLAME publication web page.

Related Projects

FLAME: Formal Linear Algebra Methods Environment.
FLARE: Formal Linear Algebra Recovery Environment. (A fault-tolerant version of the BLAS fit for space travel)
PLAPACK: Parallel Linear Algebra Package

Commonly asked questions

Yes , we rely on assembly-coded kernels. There really are only three such kernels, and they are tiny by most measures. The rest is all in C.
Our thesis: To unleash the true power of a processor, one must assembly code at least and at most the inner-kernel since compilers will always lag behind.
Yes, there are many opportunities for optimization left. We have only just begun.
Yes, we can do the same for other architectures. No, we are not funded to do so.
Yes, we can add our techniques to ATLAS to accelerate their performance. No, we are not funded to do so.
Yes, we submitted an extended abstract on our techniques to SC00. Lovely paper if I say so myself. They chose not to accept it. Thus, you will
have to wait for the journal paper instead.
Yes, it is inconvenient to have two ".a" files. No, we do not have any bright ideas about how to otherwise handle the the intellectual property rights questions the UT and Intel lawyers may ask.

Get on the FLAME mailing list

Please sign the FLAME guest book so we can keep you informed of new developments regarding ITXGEMM.

Users

Future Directions

We have a full set of level-3 BLAS coded using FLAME . They attain performance similar to the LU factorization in the performance web page. They will be released shortly. Get on our mailing list to remain informed.

The IA-64 will be targeted next.

Please give us feedback on how this kernel helps or hurts performance for your application by mailing to flame@cs.utexas.edu

Disclaimer

THE MATERIALS ARE PROVIDED "AS IS" WITHOUT ANY EXPRESS OR IMPLIED WARRANTY OF ANY KIND INCLUDING WARRANTIES OF MERCHANTABILITY, NONINFRINGEMENT OF INTELLECTUAL PROPERTY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT SHALL THE UNIVERSITY OF TEXAS OR ITS SUPPLIERS BE LIABLE FOR ANY DAMAGES WHATSOEVER (INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF PROFITS, BUSINESS INTERRUPTION, LOSS OF INFORMATION) ARISING OUT OF THE USE OF OR INABILITY TO USE THE MATERIALS, EVEN IF THE UNIVERSITY OF TEXAS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. BECAUSE SOME JURISDICTIONS PROHIBIT THE EXCLUSION OR LIMITATION OF LIABILITY FOR CONSEQUENTIAL OR INCIDENTAL DAMAGES, THE ABOVE LIMITATION MAY NOT APPLY TO YOU. The University of Texas further does not warrant the accuracy or completeness of the information, text, graphics, links or other items contained within these materials. The University of Texas may make changes to these materials, or to the products described therein, at any time without notice. The University of Texas makes no commitment to update the Materials.

Back to FLAME page

This web page is maintained by
Robert van de Geijn

flame@cs.utexas.edu

Last Updated: Dec. 14, 2000