Publications Related to the FLAME Project



Dissertations/Theses


  1. A Systematic Approach to the Design and Analysis of Linear Algebra Algorithms.
    John A. Gunnels.
    Ph.D. Dissertation. FLAME Working Note #6, The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2001-44. Nov. 2001. (Supervised by Robert van de Geijn)
    [ BibTeX ]
  2. Mechanical Derivation and Systematic Analysis of Correct Linear Algebra Algorithms
    Paolo Bientinesi
    Ph.D. Dissertation. The University of Texas at Austin, Department of Computer Sciences. Aug. 2006. (Supervised by Robert van de Geijn)

  3. Formalized Parallel Dense Linear Algebra and its Application to the Generalized Eigenvalue Problem
    Jack Poulson
    < Masters Thesis. The University of Texas at Austin, Department of Aerospace Engineering. May 2009. (Supervised by Jeffrey K. Bennighof)

  4. Application of Dependence Analysis and Runtime Data Flow Graph Scheduling to Matrix Computations
    Ernie Chan
    Ph.D. Dissertation. The University of Texas at Austin, Department of Computer Science. Aug 2010. (Supervised by Robert van de Geijn)

  5. Matrix Computations on Graphics Processors and Clusters of GPUs
    Francisco Daniel Igual Pena
    Ph.D. Dissertation. E. S. de Tecnologia y Ciencias Experimentales, Universidad Jaume I de Castellon, May 2011. (Supervised by Gregorio Quintana Orti and Rafael Mayo Gual)

  6. Algorithm/Architecture Codesign of Low Power and High Performance Linear Algebra Compute Fabrics
    Ardavan Pedram
    Ph.D. Dissertation. The University of Texas at Austin, Department of Electrical and Computer Engineering. Aug 2013. (Supervised by Andreas Gerstlauer and Robert van de Geijn)

  7. Finite Element Modeling of Electromagnetic Radiation and Induced Heat Transfer in the Human Body
    Kyungjoo Kim
    Ph.D. Dissertation. The University of Texas at Austin, Department of Engineering Mechanics. Aug 2013. (Supervised by Leszek Demkowicz, Victor Eijkhout, and Robert van de Geijn)

  8. A Calculus of Loop Invariants for Dense Linear Algebra Optimization
    Tze Meng Low
    Ph.D. Dissertation. The University of Texas at Austin, Department of Computer Science. December 2013. (Supervised by Robert van de Geijn)

  9. Design by Transformation: From Domain Knowledge to Optimized Program Generation,
    Bryan Marker
    Ph.D. Dissertation. The University of Texas at Austin, Department of Computer Science. May 2014. (Supervised by Don Batory and Robert van de Geijn)

  10. Non-orthogonal Spin-adaptation and Application to Coupled Cluster up to Quadruple Excitations
    Devin Matthews
    Ph.D. Dissertation. The University of Texas at Austin, Department of Chemistry. August 2014. (Supervised by John Stanton)

  11. Distributed Memory Tensor Computations: Formalizing Distributions, Redistributions, and Algorithm Derivation.
    Martin D. Schatz
    Ph.D. Dissertation. The University of Texas at Austin, Department of Computer Science. Dec. 2015. (Supervised by Robert van de Geijn and Tamara Kolda)

  12. Theory and Practice of Classical Matrix-Matrix Multiplication for Hierarchical Memory Architectures.
    Tyler M. Smith
    Ph.D. Dissertation. The University of Texas at Austin, Department of Computer Science. Dec. 2017. (Supervised by Robert van de Geijn and Enrique Quintana-Orti.)

  13. The science of high performance algorithms for hierarchical matrices.
    Chen-Han Yu
    Ph.D. Dissertation. The University of Texas at Austin, Department of Computer Science. Aug. 2018. (Supervised by George Biros and Robert van de Geijn.)

  14. Practical fast matrix multiplication algorithms.
    Jianyu Huang
    Ph.D. Dissertation. The University of Texas at Austin, Department of Computer Science. Aug. 2018. (Supervised by Robert van de Geijn.)

Books


  1. Linear Algebra: Foundations to Frontiers - Notes on Numerical Linear Algebra
    Robert van de Geijn
    Manuscript in preparation with some video lectures (expect periodic changes)
  2. Linear Algebra: Foundations to Frontiers - Notes on Numerical Linear Algebra
    Robert van de Geijn
    Manuscript in preparation with some video lectures (expect periodic changes)
  3. LAFF-On Programming for Correctness
    Notes, videos, interactive activities, and programming activities created for a Massive Open Online Course (MOOC) offered by edX
    Margaret Myers and Robert van de Geijn
    Self-published at ulaff.net , 2017
  4. Linear Algebra: Foundations to Frontiers - Notes to LAFF With
    Notes, videos, interactive activities, and programming activities created for a Massive Open Online Course (MOOC) offered by edX
    Margaret Myers, Pierce van de Geijn, Robert van de Geijn
    Self-published at ulaff.net , 2014
  5. Introduction to High Performance Scientific Computing
    Victor Eijkhout
    [ BibTeX ]
  6. libflame: The Complete Reference
    Field G. Van Zee
    www.lulu.com , 2009
    [ Free download ] [ Nightly updated ] [ FLAMEC BLAS Quickguide ]
    [ BibTeX ]
  7. The Science of Programming Matrix Computations
    Robert A. van de Geijn and Enrique S. Quintana-Orti
    www.lulu.com , 2008
    [ BibTeX ]
  8. Using PLAPACK: Parallel Linear Algebra Package
    Robert A. van de Geijn
    The MIT Press, 1997
    [ BibTeX ]

Journal Publications


    In review



    Accepted



    2021
  1. Supporting Mixed-domain Mixed-precision Matrix Multiplication within the BLIS Framework.
    Field G. Van Zee, Devangi N. Parikh, and Robert A. van de Geijn.
    ACM Transactions on Mathematical Software, Volume 47, Issue 2, Article 12 (June 2021), 26 pages.

  2. 2020
  3. Strassen's algorithm reloaded on GPUs.
    Jianyu Huang, Chenhan D Yu, Robert A van de Geijn.
    ACM Transactions on Mathematical Software (TOMS) 46 (1), 1-22, 2020.

  4. Implementing high-performance complex matrix multiplication via the 1m method
    Field G. Van Zee.
    SIAM Journal on Scientific Computing 42 (5), C221-C244, 2020.

    2019
  5. A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization with Partial Pivoting
    Sandra Catalan, Jose R. Herrero, Enrique S. Quintana-Orti, Rafael Rodriguez-Sanchez, Robert van de Geijn.
    IEEE Access 7, pages 17617-17633, 2019.

    2018
  6. Strassen's Algorithm for Tensor Contraction
    Jianyu Huang, Devin A. Matthews, Robert A. van de Geijn.
    SIAM Journal on Scientific Computing 40 (3), C305-C326, 2018

  7. Multi-Threaded Dense Linear Algebra Libraries for Low-Power Asymmetric Multicore Processors
    Sandra Catalán, José R. Herrero, Francisco D. Igual, Rafael Rodríguez-Sánchez, Enrique S. Quintana-Ortí, Chris Adeniyi-Jones.
    Journal of Computational Science, Volume 25, Pages 140-151, 2018.

    2017
  8. Implementing high-performance complex matrix multiplication via the 3m and 4m methods
    Field G. Van Zee and Tyler M. Smith.
    ACM Transactions on Mathematical Software, Volume 44 Issue 1, 1-36, 2017

  9. Householder QR Factorization With Randomization for Column Pivoting (HQRRP)
    Per-Gunnar Martinsson, Gregorio Quintana-Orti, Nathan Heavner, Robert van de Geijn.
    SIAM Journal on Scientific Computing, Vol. 39, Issue 2, C96-C115 (20 pages), 2017

    2016
  10. Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors
    Sandra Catalan, Francisco D. Igual, Rafael Mayo, Rafael Rodriguez-Sanchez, Enrique S. Quintana-Orti.
    Cluster Computing 19, 1037-1051, 2016.

  11. Parallel Matrix Multiplication: A Systematic Journey
    Martin D. Schatz, Robert A. van de Geijn, Jack Poulson.
    SIAM Journal on Scientific Computing
    Vol. 38, Issue 6, 2016 (online)

  12. Analytical Modeling is Enough for High Performance BLIS.
    Tze Meng Low, Francisco D. Igual, Tyler M. Smith, and Enrique S. Quintana-Orti.
    ACM Transactions on Mathematical Software, Volume 43 Issue 2, September 2016
    [ BibTeX ]
  13. The BLIS Framework: Experiments in Portability
    Field G. Van Zee, Tyler Smith, Bryan Marker, Tze Meng Low, Robert A. van de Geijn, Francisco D. Igual, Mikhail Smelyanskiy, Xianyi Zhang, Michael Kistler, Vernon Austel, John Gunnels, Lee Killough.
    ACM Transactions on Mathematical Software
    Article No. 12, Volume 42, Issue 2, June 2016

  14. A Highly Efficient Multicore Floating-Point FFT Architecture Based on Hybrid Linear Algebra/FFT Cores
    Ardavan Pedram, John McCalpin, Andreas Gerstlauer.
    The Journal of Signal Processing Systems.

    2015
  15. BLIS: A Framework for Rapidly Instantiating BLAS Functionality
    Field G. Van Zee, Robert A. van de Geijn
    ACM Transactions on Mathematical Software (TOMS)
    Volume 41, Issue 3, June 2015

    Information related to a Massive Open Online Course based on this paper that steps one through the steps for optimizing matrix-matrix multiplication can be found at ulaff.net .
  16. A Parallel Sparse Direct Solver via Hierarchical DAG Scheduling
    Kyungjoo Kim, Victor Eijkhout.
    ACM Transactions on Mathematical Software
    Volume 41 Issue 1, October 2014

  17. Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the Intel Xeon Phi
    Manuel F. Dolz, Francisco D. Igual, Thomas Ludwig, Luis Piñuel, Enrique S. Quintana-Ortí.
    Computers & Electrical Engineering, 2015

  18. Non-orthogonal spin-adaptation of coupled cluster methods: A new implementation of methods including quadruple excitations.
    Devin A. Matthews and John F. Stanton
    The Journal of Chemical Physics, 142 (6), 2015.

  19. 2014
  20. Algorithm, Architecture, and Floating-Point Unit Codesign of a Matrix Factorization Accelerator.
    Ardavan Pedram, Andreas Gerstlauer, and Robert van de Geijn
    IEEE Transactions on Computers, Special Section on Computer Arithmetic, August 2014.
  21. Exploiting Symmetry in Tensors for High Performance.
    Martin D. Schatz, Tze Meng Low, Robert A. van de Geijn, Tamara G. Kolda.
    SIAM Journal on Scientific Computing, 36(5), Sep. 2014

  22. Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance
    Field G. Van Zee, Robert A. van de Geijn, Gregorio Quintana-Ortí
    ACM Transactions on Mathematical Software (TOMS)
    April 2014
  23. Enhancing Performance and Energy Consumption of Runtime Schedulers for Dense Linear Algebra.
    Pedro Alonso, Manuel F. Dolz, Francisco D. Igual, Rafael Mayo and Enrique S. Quintana-Ortí.
    Concurrency and Computation: Practice and Experience.
    (See also FLAME Working Note #73.)
    [ BibTeX ]
    2013
  24. High-Performance Solvers for Dense Hermitian Eigenproblems
    Matthias Petschow, Elmar Peise, Paolo Bientinesi
    SIAM Journal on Scientific Computing, Volume 35(1), pp. C1-C22, January 2013.

  25. A Case Study in Mechanically Deriving Dense Linear Algebra Code
    Bryan Marker, Don Batory, and Robert van de Geijn
    The International Journal of High Performance Computing Applications Volume 27 Issue 4, November 2013
    [ Abstract etc. ] [ BibTeX ] BibTeX ] [Available from the authors upon request]
  26. Elemental: A New Framework for Distributed Memory Dense Matrix Computations
    Jack Poulson, Bryan Marker, Robert A. van de Geijn, Jeff R. Hammond, Nichols A. Romero
    ACM Transactions on Mathematical Software (TOMS), 2013

  27. Scheduling Algorithms-by-blocks on Small Clusters
    Francisco D. Igual, Gregorio Quintana-Orti, and Robert van de Geijn
    Concurrency and Computation: Practice and Experience
    [ Abstract etc. ][ BibTeX ]
    2012
  28. Deriving Linear Algebra Libraries
    Paolo Bientinesi, John Gunnels, Maggie Myers, Enrique Quintana-Orti, Tyler Rhodes, Robert van de Geijn, and Field Van Zee
    Formal Aspects of Computing
    [See also FLAWN57]
  29. Families of Algorithms for Reducing a Matrix to Condensed Form
    Field G. Van Zee, Robert A. van de Geijn, Gregorio Quintana-Ortí, G. Joseph Elizondo
    ACM Transactions on Mathematical Software (TOMS), 2012

  30. Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures
    Ardavan Pedram, Robert A. van de Geijn, Andreas Gerstlauer
    IEEE Transactions on Computers
    [ Abstract etc. ][ BibTeX ]
  31. Programming Many-Core Architectures - A Case Study: Dense Matrix Computations on the Intel SCC Processor
    Bryan Marker, Ernie Chan, Jack Poulson, Robert van de Geijn, Rob F. Van der Wijngaart, Timothy G. Mattson, and Theodore E. Kubaska
    Concurrency and Computation: Practice and Experience
    Volume 24, Issue 12, pages 1317-1333, 25 August 2012
    [ Abstract etc. ][ BibTeX ]
  32. A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures
    Gregorio Quintana-Ortí, Francisco D. Igual, Mercedes Marqués, Enrique S. Quintana-Ortí, Robert A. van de Geijn
    ACM Transactions on Mathematical Software (TOMS), 2012

  33. The FLAME Approach: From Dense Linear Algebra Algorithms to High-Performance Multi-Accelerator Implementations
    Francisco D. Igual, Ernie Chan, Enrique S Quintana-Orti, Gregorio Quintana-Orti, Robert A van de Geijn, Field G van Zee
    Journal of Parallel and Distributed Computing
    [ Abstract etc. ][ BibTeX ]
    2011
  34. High-performance up-and-downdating via Householder-like transformations
    Robert A. van de Geijn, Field G. Van Zee
    ACM Transactions on Mathematical Software (TOMS), 2011
  35. Using desktop computers to solve large-scale dense linear algebra problems
    Mercedes Marques, Gregorio Quintana-Orti, Enrique S. Quintana-Orti, Robert van de Geijn
    The Journal of Supercomputing, Vol. 58, Issue 2, 2011
    [ Abstract etc. ][ BibTeX ]
  36. Goal-Oriented and Modular Stability Analysis
    Paolo Bientinesi, Robert A. van de Geijn
    SIAM Journal on Matrix Analysis and Applications , Volume 32 Issue 1, February 2011
    [ Abstract, etc. ] [ BibTeX ]
    2010
  37. Sparse Direct Factorizations through Unassembled Hyper-Matrices
    Paolo Bientinesi, Victor Eijkhout, Kyungjoo Kim, Jason Kurtz, and Robert van de Geijn
    Computer Methods in Applied Mechanics and Engineering, 199, 430--438, 2010
    [ Abstract etc. ] [ BibTeX ]
  38. Toward Mechanical Derivation of Krylov Solver Libraries
    Victor Eijkhout, Paolo Bientinesi, Robert van de Geijn
    Procedia Computer Science, 1(1) 1805-1813, 2010 (Proceedings of ICCS2010.)
    [ Abstract etc. ] [ BibTeX ]
    2009
  39. Programming matrix algorithms-by-blocks for thread-level parallelism
    Gregorio Quintana-Orti, Enrique S. Quintana-Orti, Robert A. van de Geijn, Field G. Van Zee, Ernie Chan
    ACM Transactions on Mathematical Software (TOMS), 2009

  40. The libflame Library for Dense Matrix Computations
    Field G. Van Zee, Ernie Chan, Robert A. van de Geijn, Enrique S. Quintana-Orti, Gregorio Quintana-Orti,
    IEEE Computing in Science and Engineering, Vol. 11, No 6, November/December 2009
    [ Abstract, etc. ] [ BibTeX ]
    2008
  41. Updating an LU Factorization with Pivoting
    Enrique S. Quintana-Orti, Robert A. van de Geijn
    ACM Transactions on Mathematical Software (TOMS), 2008

  42. High-performance implementation of the level-3 BLAS
    Kazushige Goto, Robert van de Geijn
    ACM Transactions on Mathematical Software (TOMS), 2008

  43. Families of algorithms related to the inversion of a Symmetric Positive Definite matrix
    Paolo Bientinesi, Brian Gunter, Robert A. van de Geijn
    ACM Transactions on Mathematical Software (TOMS), 2008

  44. Anatomy of high-performance matrix multiplication
    Kazushige Goto, Robert A. van de Geijn
    ACM Transactions on Mathematical Software (TOMS), 2008

  45. Scalable parallelization of FLAME code via the workqueuing model
    Field G. Van Zee, Paolo Bientinesi, Tze Meng Low, Robert A. van de Geijn
    ACM Transactions on Mathematical Software (TOMS), 2008

    2007
  46. Collective communication: theory, practice, and experience
    Ernie Chan, Marcel Heimlich, Avi Purkayastha, Robert van de Geijn
    Concurrency and Computation: Practice & Experience , Volume 19 Issue 1, September 2007
    [ Abstract, etc. ] [ BibTeX ]
    2006
  47. Improving the performance of reduction to Hessenberg form
    Gregorio Quintana-Orti, Robert van de Geijn
    ACM Transactions on Mathematical Software (TOMS), 2006

  48. Accumulating Householder transformations, revisited
    Thierry Joffrain, Tze Meng Low, Enrique S. Quintana-Orti, Robert van de Geijn, Field G. Van Zee
    ACM Transactions on Mathematical Software (TOMS), 2006

    2005
  49. A Parallel Eigensolver for Dense Symmetric Matrices Based on Multiple Relatively Robust Representations
    Paolo Bientinesi, Inderjit S. Dhillon, Robert A. van de Geijn
    SIAM Journal on Scientific Computing , Volume 27 Issue 1, July 2005
    [ Abstract, etc. ] [ BibTeX ]
  50. Representing linear algebra algorithms in code: the FLAME application program interfaces
    Paolo Bientinesi, Enrique S. Quintana-Orti, Robert A. van de Geijn
    ACM Transactions on Mathematical Software (TOMS), 2005

  51. Parallel out-of-core computation and updating of the QR factorization
    Brian C. Gunter, Robert A. van de Geijn
    ACM Transactions on Mathematical Software (TOMS), 2005

  52. The science of deriving dense linear algebra algorithms
    Paolo Bientinesi, John A. Gunnels, Margaret E. Myers, Enrique S. Quintana-Orti, Robert A. van de Geijn
    ACM Transactions on Mathematical Software (TOMS), 2005

    2003
  53. Formal derivation of algorithms: The triangular Sxsylvester equation
    Enrique S. Quintana-Orti, Robert A. van de Geijn
    ACM Transactions on Mathematical Software (TOMS), 2003

    2001
  54. FLAME: Formal Linear Algebra Methods Environment
    John A. Gunnels, Fred G. Gustavson, Greg M. Henry, Robert A. van de Geijn
    ACM Transactions on Mathematical Software (TOMS), 2001

Chapters


    2022
  1. Applying Dijkstra's Vision to Numerical Software.
    Robert van de Geijn and Maggie Myers.
    In: Edsger Wybe Dijkstra: His Life,Work, and Legacy (1st ed.).
    Association for Computing Machinery, New York, NY, USA, 215-230. 2022.

    2012
  2. The Spike Factorization as Domain Decomposition Method; Equivalent and Variant Approaches
    Victor Eijkhout and Robert van de Geijn
    In High-Performance Scientific Computing (Michael W. Berry, Kyle A. Gallivan, Efstratios Gallopoulos, Ananth Grama, Bernard Philippe, Yousef Saad, and Faisal Saied, eds.) pp. 157-169. Springer London. 2012.
    [ PDF ]
    2011
  3. All-to-All
    Jesper Larsson Traeff and Robert A. vande Geijn.
    Encyclopedia of Parallel Computing , Part 1, Pages 42-47. 2011.

  4. Collective Communication
    Robert van de Geijn and Jesper Larsson Traeff.
    Encyclopedia of Parallel Computing , Part 3, Pages 318-327. 2011

  5. Broadcast Jesper Larsson Traeff and Robert A. van de Geijn.
    Encyclopedia of Parallel Computing , Part 2, Pages 186-192. 2011

  6. libflame
    Field G. Van Zee, Ernie Chan and Robert A. van de Geijn.
    Encyclopedia of Parallel Computing , Part 12, Pages 1010-1014, 2011

  7. Allgather
    Jesper Larsson Traeff and Robert A. van de Geijn.
    Encyclopedia of Parallel Computing , Part 1, Pages 39-42. 2011

  8. BLAS (Basic Linear Algebra Subprograms)
    Robert van de Geijn and Kazushige Goto
    Encyclopedia of Parallel Computing , Part 2, Pages 157-164. 2011

Conference and Workshop Publications


    2018
  1. Learning from Optimizing Matrix-Matrix Multiplication
    Devangi N. Parikh, Jianhy Huang, Margaret E. Myers and Robert A. van de Geijn
    2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2018

    2017
  2. Generating Families of Practical Fast Matrix Multiplication Algorithms
    Jianyu Huang, Leslie Rice, Devin A. Matthews, Robert A. van de Geijn.
    Proceedings of the 31st IEEE International Parallel and Distributed Processing Symposium (IPDPS'17), 2017

  3. Reduction to Tridiagonal Form for Symmetric Eigenproblems on Asymmetric Multicore Processors
    Pedro Alonso, Sandra Catalan, Jose R. Herrero, Enrique S. Quintana-Orti, Rafael Rodriguez-Sanchez.
    8th International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM 2017).

    2016
  4. Strassen's Algorithm Reloaded
    Jianyu Huang, Tyler M. Smith, Greg M. Henry, Robert A. van de Geijn.
    Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16)
    [ BibTeX ]
    2015
  5. Performance Optimization for the K-nearest Neighbors Kernel on x86 Architectures.
    Chenhan D. Yu, Jianyu Huang, Woody Austin, Bo Xiao, and George Biros.
    Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'15)
    [ BibTeX ]
  6. Refactoring conventional task schedulers to exploit asymmetric ARM big.LITTLE architectures.
    Luis Costero, Francisco D. Igual, Katzalin Olcoz, Sandra Catalan, Rafael Rodriguez-Sanchezz, Enrique S. Quintana-Orti.
    6th Int. Workshop on Accelerators and Hybrid Exascale Systems -- AsHES 2016 (accepted and pending publication). Chicago (EE.UU.). 2016.

    2014
  7. Understanding Performance Stairs: Elucidating Heuristics
    Bryan Marker, Don Batory, Robert van de Geijn.
    29th IEEE/ACM International Conference on Automated Software Engineering (ASE 2014). Accepted.

  8. Anatomy of High-Performance Many-Threaded Matrix Multiplication <
    Tyler M. Smith, Robert van de Geijn, Mikhail Smelyanskiy, Jeff R. Hammond, and Field G. Van Zee.
    International Parallel and Distributed Processing Symposium 2014. Accepted.

    2013
  9. Transforming a Linear Algebra Core to an FFT Accelerator.
    Ardavan Pedram, John McCalpin, and Andreas Gerstlauer.
    ASAP 2013, to appear.
    [ PDF (draft)]
  10. Code Generation and Optimization of Distributed-Memory Dense Linear Algebra Kernels
    Bryan Marker, Don Batory, and Robert van de Geijn.
    International Workshop on Automatic Performance Tuning (iWAPT'13)
    [ PDF ]
  11. Floating Point Architecture Extensions for Optimized Matrix Factorization
    Ardavan Pedram, Andreas Gerstlauer and Robert van de Geijn.
    21st IEEE International Symposium on Computer Arithmetic, to be held in Austin, Texas, USA in April 2013. Accepted.
    [ PDF ]
    2012
  12. On the Efficiency of Register File versus Broadcast Interconnect for Collective Communications in Data-Parallel Hardware Accelerators
    Ardavan Pedram, Andreas Gerstlauer and Robert van de Geijn.
    SBAC-PAD 2012. Accepted.
    [ PDF (draft)]
  13. Level-3 BLAS on the TI C6678 multi-core DSP
    Murtaza Ali, Eric Stotzer, Francisco D. Igual, and Robert van de Geijn.
    SBAC-PAD 2012. Accepted.
    [ PDF (draft)]
  14. Unleashing the high-performance and low-power of multi-core DSPs for general-purpose HPC
    Francisco D. Igual, Murtaza Ali, Arnon Friedmann, Eric Stotzer, Timothy Wentz, and Robert van de Geijn.
    SC12. Accepted.
    [ PDF (draft)]
  15. Designing Linear Algebra Algorithms by Transformation: Mechanizing the Expert Developer
    Bryan Marker, Jack Poulson, Don Batory, and Robert van de Geijn
    iWAPT2012.
    [ PDF (draft)]
  16. A Linear Algebra Core Design for Efficient Level-3 BLAS
    Ardavan Pedram, Syed Gilani, Nam Sung Kim, Robert van de Geijn, Michael Schulte, Andreas Gerstlauer. (poster)
    ASAP, 2012.
    [ PDF (draft)]
    2011
  17. A High-Performance, Low-Power Linear Algebra Core
    Ardavan Pedram, Andreas Gerstlauer, and Robert van de Geijn
    22rd IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2011), 2011
    [ PDF (draft)] [ Abstract etc. ] [ BibTeX ]
  18. Retargeting PLAPACK to Clusters with Hardware Accelerators
    Manuel Fogue and Francisco D. Igual, Enrique Quintana-Orti, and Robert van de Geijn.
    2010 International Conference on High Performance Computing and Simulation (HPCS 2010), 2010
    [ PDF (draft)] [ Abstract etc. ] [ BibTeX ]
    2010
  19. ACM DL Author-ize service Managing the complexity of lookahead for LU factorization with pivoting
    Ernie Chan, Robert van de Geijn, Andrew Chapman
    SPAA '10 Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures, 2010
    [ Abstract, etc. ] [ BibTeX ]
  20. Transforming Linear Algebra Libraries: From Abstraction to Parallelism
    Ernie Chan, Jim Nagle, Robert van de Geijn, and Field G. Van Zee.
    HIPS'10: Proceedings of Fifteenth International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2010
    [ PDF] [ Abstract etc. ] [ BibTeX ]
    2009
  21. Out-of-Core Computation of the QR Factorization on Multi-Core Processors
    Mercedes Marques, Gregorio Quintana-Orti, Enrique S. Quintana-Orti, and Robert van de Geijn.
    Proceedings of the 15th International Euro-Par Conference on Parallel Processing (Euro-Par 2009), 2009
    [ PDF] [ Abstract etc. ] [ BibTeX ]
  22. Solving "Large" Dense Matrix Problems on Multi-Core Processors and GPUs
    Mercedes Marques, Gregorio Quintana-Orti, Enrique S. Quintana-Orti, and Robert van de Geijn.
    10th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing - PDSEC'09. Roma (Italia), 2009.
    [ PDF] [ Abstract etc. ] [ BibTeX ]
  23. Using Graphics Processors to Accelerate the Solution of Out-of-Core Linear System
    Mercedes Marques, Gregorio Quintana-Orti, Enrique S. Quintana-Orti, and Robert van de Geijn.
    IEEE International Symposium on Parallel and Distributed Computing, Lisbon (Portugal), 2009.
    [ PDF] [ Abstract etc. ] [ BibTeX ]
  24. Fast Development of Dense Linear Algebra Codes on Graphics Processors
    Maria Jesus Zafont, Alberto Martin, Francisco D. Igual, and Enrique S. Quintana-Orti.
    14th International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2009.
    [ PDF] { Abstract etc. ] [ BibTeX ]
  25. ACM DL Author-ize service Solving dense linear systems on platforms with multiple hardware accelerators
    Gregorio Quintana-Orti, Francisco D. Igual, Enrique S. Quintana-Orti, Robert A. van de Geijn
    PPoPP '09 Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, 2009
    [ Abstract, etc. ] [ BibTeX ]
    2008
  26. ACM DL Author-ize serviceHigh performance dense linear algebra on a spatially distributed processor
    Jeffrey R. Diamond, Behnam Robatmili, Stephen W. Keckler, Robert van de Geijn, Kazushige Goto, Doug Burger
    PPoPP '08 Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, 2008
    [ Abstract, etc. ] [ BibTeX ]
  27. An Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization
    Gregorio Quintana-Orti, Enrique S. Quintana-Orti, Alfredo Remon, and Robert A. van de Geijn.
    in High Performance Computing for Computational Science - VECPAR 2008, 2008
    [ PDF] [ Abstract etc. ] [ BibTeX ]
  28. Design of Scalable Dense Linear Algebra Libraries for Multithreaded Architectures: the LU Factorization
    Gregorio Quintana-Orti, Enrique S. Quintana-Orti, Ernie Chan, Robert van de Geijn, and Field G. Van Zee.
    Workshop on Multithreaded Architectures and Applications, MTAAP 2008
    [ PDF] [ BibTeX ]
  29. [ Abstract, etc. ] [ BibTeX ]
  30. Scheduling of QR factorization algorithms on SMP and multi-core architectures
    Gregorio Quintana-Orti, Enrique S. Quintana-Orti, Ernie Chan, Field G. Van Zee, and Robert A. van de Geijn.
    Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008), 2008
    [ PDF] [ Abstract etc. [ BibTeX ]
    2007
  31. Satisfying your Dependencies with SuperMatrix
    Ernie Chan, Field G. Van Zee, Enrique S. Quintana-Orti, Gregorio Quintana-Orti, Robert van de Geijn.
    Proceedings of IEEE Cluster Computing 2007, pp. 91 - 99, Austin, Texas, September 2007.
    [ PDF] [ Abstract etc. [ BibTeX ]
  32. Toward Scalable Matrix Multiply on Multithreaded Architectures
    Bryan Marker, Field Van Zee, Kazushige Goto, Gregorio Quintana-Orti, Robert
    van de Geijn. Proceedings of European Conference on Parallel and Distributed Computing (EuroPar 2007), pp. 748-757, 2007.
    [ PDF ] [ Bibtex entry] [ BibTeX ]
  33. ACM DL Author-ize serviceSupermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures
    Ernie Chan, Enrique S. Quintana-Orti, Gregorio Quintana-Orti, Robert van de Geijn
    SPAA '07 Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, 2007
    [ Abstract, etc. ] [> Bibtex entry ] [ BibTeX ]
  34. Formal Correctness and Stability of Linear Algebra Algorithms
    Paolo Bientinesi and Robert van de Geijn.
    IMACS05.
    [ Postscript] [ PDF] [ BibTeX ]
    2006
  35. ACM DL Author-ize serviceCollective communication on architectures that support simultaneous communication over multiple links
    Ernie Chan, Robert van de Geijn, William Gropp, Rajeev Thakur
    PPoPP '06 Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, 2006
    [ Abstract, etc. ] [ BibTeX ]
    2005
  36. ACM DL Author-ize serviceExtracting SMP parallelism for dense linear algebra algorithms from high-level specifications
    Tze Meng Low, Robert A. van de Geijn, Field G. Van Zee
    PPoPP '05 Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, 2005
    [ Abstract, etc. ] [ BibTeX ]
  37. A Family of High-Performance Matrix Multiplication Algorithms
    John Gunnels, Fred Gustavson, Greg Henry, and Robert A. van de Geijn,
    PARA 2004, LNCS 3732, pp. 2256-265, 2005.
    [ BibTeX ]
  38. Rapid Development of High-Performance Linear Algebra Libraries
    Paolo Bientinesi, John Gunnels, Fred Gustavson, Greg Henry, Margaret Myers,
    Enrique S. Quintana-Orti, and Robert A. van de Geijn, PARA 2004, LNCS 3732, pp. 376--384, 2005.
    [ Postscript] (early draft) [ PDF (early draft)] [ BibTeX ]
  39. Automatic Derivation of Linear Algebra Algorithms with Application to Control Theory
    Paolo Bientinesi, Sergey Kolos, and Robert A. van de Geijn
    PARA 2004, LNCS 3732, pp. 385--394, 2005.
    [ Postscript] (early draft) [ PDF (early draft)] [ BibTeX ]
  40. Rapid Development of High-Performance Out-of-Core Solvers
    Thierry Joffrain, Enrique S. Quintana-Orti, and Robert A. van de Geijn.
    PARA 2004, LNCS 3732, pp. 413--422, 2005.
    [ Postscript(early draft)] [ PDF (early draft)] [ BibTeX ]
    2001
  41. A Family of High-Performance Matrix Algorithms
    John A. Gunnels, Greg M. Henry, and Robert A. van de Geijn.
    In Computational Science - 2001, Part I Lecture Notes in Computer Science 2073, pp. 51-60, Springer, 2001.
    [ PDF] [ Abstract etc. ] [ BibTeX ]
  42. Fault-Tolerant High-Performance Matrix-Matrix Multiplication: Theory and Practice
    John A. Gunnels, Daniel S. Katz, Enrique S. Quintana-Orti, and Robert van de Geijn.
    International Conference for Dependable Systems and Networks (DSN-2001), pp. 47-56, July 2-4, 2001.
    [ Abstract etc. ] [ BibTeX ]
  43. Formal Methods for High-Performance Linear Algebra Libraries
    John Gunnels and Robert van de Geijn
    The Architecture of Scientific Software: Ifip Tc2/Wg2.5 Working Conference on the Architecture of Scientific Software, October 2-4, 2000, Ottawa, Canada(Ronald F. Boisvert and P. T. Tang, editors), pp. 193-210, Kluwer Academic Press, 2001
    [ BibTeX ]

FLAME Working Notes (FLAWNS) and arXiv publications


  1. GEMMFIP: Unifying GEMM in BLIS
    RuQing G. Xu, Field G. Van Zee, Robert A. van de Geijn
    arXiv:2302.08417 [cs.MS], 2023

  2. Cascading GEMM: High Precision from Low Precision
    Devangi N. Parikh, Robert A. van de Geijn, Greg M. Henry
    arXiv:2303.04353 [cs.MS], 2023

  3. The MOMMS Family of Matrix Multiplication Algorithms
    Tyler M. Smith, Robert A. van de Geijn
    arXiv:1904.05717 [cs.MS], 2019

  4. Supporting mixed-datatype matrix multiplication within the BLIS framework
    Field G Van Zee, Devangi N Parikh, Robert A van de Geijn
    FLAME Working Note #89, The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-19-01. 2019. Also available from arXiv .

  5. Implementing Strassen's Algorithm with CUTLASS on NVIDIA Volta GPUs
    Jianyu Huang, Chenhan D. Yu, Robert A. van de Geijn
    FLAME Working Note #88, The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-17-08. 2018. Also available from arXiv .

  6. A Simple Methodology for Computing Families of Algorithms
    Devangi N. Parikh, Maggie E. Myers, Richard Vuduc, Robert A. van de Geijn
    FLAME Working Note #87, The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-17-06. 2018. Also available from arXiv .

  7. Deriving Correct High-Performance Algorithms
    Devangi N. Parikh, Maggie E. Myers, Robert A. van de Geijn
    FLAME Working Note #86, The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-17-07. 2017. Also available from arXiv .

  8. Inducing complex matrix multiplication via the 1m method
    Field G. Van Zee
    FLAME Working Note #85, The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-17-03. 2017.

  9. Strassen's Algorithm for Tensor Contraction
    Jianyu Huang, Devin A. Matthews, and Robert A. van de Geijn.
    FLAME Working Note #84, The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-17-02. April 3, 2017.

  10. Pushing the Bounds for Matrix-Matrix Multiplication
    Tyler M. Smith and Robert A. van de Geijn
    FLAME Working Note #83, arXiv:1702.02017, Feb. 3, 2017.

  11. Generating Families of Practical Fast Matrix Multiplication Algorithms
    Jianyu Huang, Leslie Rice, Devin A. Matthews, and Robert A. van de Geijn.
    FLAME Working Note #82, The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-16-17. November 3, 2016.
    [ BibTeX ]
  12. Inducing complex matrix multiplication via the 3m and 4m methods
    Field F. Van Zee and Tyler M. Smith.
    FLAME Working Note #81, The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-16-17. October 18, 2016.
    [ BibTeX ]
  13. BLISlab: A Sandbox for Optimizing GEMM
    Jianyu Huang and Robert A. van de Geijn.
    FLAME Working Note #80, The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-16-13. August 31, 2016.
    [ BibTeX ]
  14. Implementing Strassen's Algorithm with BLIS.
    Jianyu Huang, Tyler M. Smith, Greg M. Henry, Robert A. van de Geijn.
    FLAME Working Note #79, The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-16-03. April 16, 2016.
    [ BibTeX ]
  15. Householder QR Factorization: Adding Randomization for Column Pivoting.
    Per-Gunnar Martinsson, Gregorio Quintana-Orti, Nathan Heavner, Robert van de Geijn
    FLAME Working Note #78, arXiv:1512.02671. Dec. 2015.
    [ BibTeX ]
  16. Revisiting Conventional Task Schedulers to Exploit Asymmetry in ARM big.LITTLE Architectures for Dense Linear Algebra
    Luis Costero, Francisco D. Igual, Katzalin Olcoz, Enrique S. Quintana-Orti.
    FLAME Working Note #77, arXiv:1509.02058. Sept. 2015.
    [ BibTeX ]
  17. Toward ABFT for BLIS GEMM.
    Tyler M. Smith, Robert A. van de Geijn, Mikhail Smelyanskiy, Enrique S. Quintana-Orti.
    FLAME Working Note #76, The University of Texas at Austin, Department of Computer Science. Report TR-15-05. Originally published June 13, 2015 and revised Nov. 5, 2015.
    [ BibTeX ]
  18. DxTer: An Extensible Tool for Optimal Data ow Program Generation.
    Bryan Marker, Martin Schatz, Devin Matthews, Isil Dillig, Robert van de Geijn, and Don Batory.
    FLAME Working Note #75, The University of Texas at Austin, Department of Computer Science. Technical Report TR-15-03. 2015.
    [ BibTeX ]
  19. Analytical Models for the BLIS Framework.
    Tze Meng Low, Francisco D. Igual, Tyler M. Smith, and Enrique S. Quintana-Orti.
    FLAME Working Note #74. (Submitted to ACM TOMS.)
    [ BibTeX ]
  20. Enhancing Performance and Energy Consumption of Runtime Schedulers for Dense Linear Algebra.
    Pedro Alonso, Manuel F. Dolz, Francisco D. Igual, Rafael Mayo and Enrique S. Quintana-Ortí.
    FLAME Working Note #73. (To appear in Concurrency and Computation: Practice and Experience .)
    [ BibTeX ]
  21. Anatomy of Parallel Compution with Tensors
    Martin D. Schatz
    FLAME Working Note #72. The University of Texas at Austin, Department of Computer Science. Technical Report TR-13-21. 2013.
    [ BibTeX ]
  22. Opportunities for Parallelism in Matrix Multiplication
    Tyler M. Smith, Robert van de Geijn, Mikhail Smelyanskiy, Jeff R. Hammond, and Field G. Van Zee.
    FLAME Working Note #71. The University of Texas at Austin, Department of Computer Science. Technical Report TR-13-20. 2013.

    To appear as:
    Anatomy of High-Performance Many-Threaded Matrix Multiplication
    Tyler M. Smith, Robert van de Geijn, Mikhail Smelyanskiy, Jeff R. Hammond, and Field G. Van Zee.
    International Parallel and Distributed Processing Symposium 2014.
    [ BibTeX ]
  23. Adding Aggressive Early Deflation to the Restructured Symmetric QR Algorithm
    James Levitt.
    FLAME Working Note #70. The University of Texas at Austin, Department of Computer Science. Honors Thesis Report HR-13-07. 2013.
    [ BibTeX ]
  24. Implementing Level-3 BLAS with BLIS: Early Experience
    Field G. Van Zee, Tyler Smith, Francisco D. Igual, Mikhail Smelyanskiy, Xianyi Zhang, Michael Kistler, Vernon Austel, John Gunnels, Tze Meng Low, Bryan Marker, Lee Killough, Robert A. van de Geijn.
    FLAME Working Note #69. The University of Texas at Austin, Department of Computer Science. Technical Report TR-13-03. 2013.
    [ BibTeX ]
  25. Exploiting Symmetry in Tensors for High Performance: an Initial Study
    Martin D. Schatz, Tze Meng Low, Robert A. van de Geijn, Tamara G. Kolda
    FLAME Working Note #68. The University of Texas at Austin, Department of Computer Science. Technical Report TR-12-33. 2012.
    [ BibTeX ]
  26. Code Generation of Optimized Distributed-Memory Dense Linear Algebra Kernels
    Bryan Marker, Don Batory, and Robert A. van de Geijn
    FLAME Working Note #67. The University of Texas at Austin, Department of Computer Science. Technical Report TR-12-31. 2012.
    [ BibTeX ]
  27. BLIS: A Framework for Generating BLAS-like Libraries
    Field G. Van Zee and Robert A. van de Geijn
    FLAME Working Note #66. The University of Texas at Austin, Department of Computer Science. Technical Report TR-12-30. 2012.
    [ BibTeX ]

    We recommend you read the more up-to-date version of this paper instead:
    BLIS: A Framework for Rapid Instantiation of BLAS Functionality
    Field G. Van Zee, Robert A. van de Geijn.
    ACM Transactions on Mathematical Software

  28. A Parallel Sparse Direct Solver via Hierarchical DAG Scheduling
    Kyungjoo Kim and Victor Eijkhout
    FLAME Working Note #65. The University of Texas at Austin, Texas Advanced Computing Center. Technical Report TR-12-05. 2012.
    [ BibTeX ]
  29. Theory and Practice of Fusing Loops when Optimizing Parallel Dense Linear Algebra Operations
    Tze Meng Low, Bryan Marker, Robert van de Geijn
    FLAME Working Note #64. The University of Texas at Austin, Department of Computer Science. Technical Report TR-12-18. August 2012.
    [ BibTeX ]
  30. Dense Matrix Computation on a Heterogenous Architecture: A Block Synchronous Approach
    Kyungjoo Kim, Victor Eijkhout, and Robert van de Geijn.
    FLAME Working Note #63. Texas Advanced Computer Center, The University of Texas at Austin. Technical Report TR-12-04. 2012.
    [ BibTeX ]
  31. Parallel Matrix Multiplication: 2D and 3D
    Martin Schatz, Jack Poulson, and Robert van de Geijn.
    FLAME Working Note #62. The University of Texas at Austin, Department of Computer Science. Technical Report TR-12-13. June 2012.
    [ BibTeX ]
  32. Unleashing DSPs for General-Purpose HPC.
    Francisco D. Igual, Murtaza Ali, Arnon Friedmann, Eric Stotzer, Timothy Wentz, and Robert van de Geijn.
    FLAME Working Note #61. The University of Texas at Austin, Department of Computer Science. Technical Report TR-12-02. February 2012.
    [ BibTeX ]
  33. Restructuring the QR Algorithm for High-Performance Application of Givens Rotations.
    Field G. Van Zee, Robert van de Geijn, Gregorio Quintana-Orti.
    FLAME Working Note #60. The University of Texas at Austin, Department of Computer Science. Technical Report TR-11-36. October 2011. (Submitted to ACM TOMS.)
    [ BibTeX ]
  34. Co-Design Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures.
    Ardavan Pedram, Andreas Gerstlauer, and Robert A. van de Geijn.
    FLAME Working Note #59. The University of Texas at Austin, Computer Engineering Research Center, Technical Report UT-CERC-12-02. Oct. 2011.
    [ BibTeX ]
  35. Mechanizing the Expert Dense Linear Algebra Developer.
    Bryan Marker, Andy Terrel, Jack Poulson, Don Batory, and Robert van de Geijn.
    FLAME Working Note #58. The University of Texas at Austin, Department of Computer Science. Technical Report TR-11-18. April 2011. (Refined paper submitted to PPoPP'12.)
    [ BibTeX ]
  36. Deriving Linear Algebra Libraries.
    Robert van de Geijn, Tyler Rhodes, Maggie Myers, and Field Van Zee.
    FLAME Working Note #57. The University of Texas at Austin, Department of Computer Science. Technical Report TR-11-09. March 2011. (Submitted to FAC.)
    [ BibTeX ]
  37. Parallel Algorithms for Reducing the Generalized Hermitian-Definite Eigenvalue Problem.
    Jack Poulson, Robert van de Geijn, and Jeffrey Bennighof.
    FLAME Working Note #56. The University of Texas at Austin, Department of Computer Science. Technical Report TR-11-05. Feb. 2011.
    [ Related FLAME@lab (M-script for Matlab and Octave) implementations ] [ BibTeX ]
  38. Programming Many-Core Architectures - A Case Study: Dense Matrix Computations on the Intel SCC Processor.
    Bryan Marker, Ernie Chan, Jack Poulson, Robert van de Geijn, Rob F. Van der Wijngaart, Timothy G. Mattson, and Theodore E. Kubaska.
    FLAME Working Note #55. The University of Texas at Austin, Department of Computer Science. Technical Report TR-11-03. Jan. 2011.
    [ BibTeX ]
  39. Architecture Design by Transformation
    Taylor L. Riche, Don Batory, Rui Goncalves, Bryan Marker.
    FLAME Working Note #54. The University of Texas at Austin, Department of Computer Science. Technical Report TR-10-39. Dec. 14, 2010.
    [ BibTeX ]
  40. Algorithms for Reducing a Matrix to Condensed Form.
    Field G. Van Zee, Robert van de Geijn, Gregorio Quintana-Orti, and G. Joseph Elizondo.
    FLAME Working Note #53. The University of Texas at Austin, Department of Computer Science. Technical Report TR-10-37. Oct. 29, 2010.
    [ BibTeX ]
  41. MR3-SMP: A Symmetric Tridiagonal Eigensolver for Multi-Core Architectures.
    Matthias Petschow and Paolo Bientinesi.
    FLAME Working Note #52. Aachen Institute for Computational Engineering Science, RWTH Aachen. AICES-2010/10-2, October 2010.
    [ BibTeX ]
  42. Automatic Generation of Partitioned Matrix Expressions for Matrix Operations.
    Diego Fabregat and Paolo Bientinesi.
    FLAME Working Note #51. Aachen Institute for Computational Engineering Science, RWTH Aachen. AICES-2010/10-1, October 2010.
    [ BibTeX ]
  43. Runtime Data Flow Graph Scheduling of Matrix Computations with Multiple Hardware Accelerators
    Ernie Chan and Francisco D. Igual.
    FLAME Working Note #50. The University of Texas at Austin, Department of Computer Science. Technical Report TR-10-36. Oct. 14, 2010.
    [ BibTeX ]
  44. Towards a High Performance, Low Power Linear Algebra Processor
    Ardavan Pedram, Andreas Gerstlauer, and Robert van de Geijn.
    FLAME Working Note #49. The University of Texas at Austin, Computer Engineering Research Center. Technical Report UT-CERC-10-03. September 1, 2010.
    [ BibTeX ]
  45. Solving Linear Algebra Problems on Distributed-Memory Computers using Serial Codes
    Francisco D. Igual and Gregorio Quintana-Orti.
    FLAME Working Note #48. Universidad Jaume I, Depto. de Ingenieria y Ciencia de Computadores. Technical Report DICC 2010-07-01. July 31, 2010.
    [ BibTeX ]
  46. Proof-driven Derivation of Krylov Solver Libraries.
    Victor Eijkhout, Paolo Bientinesi, Robert van de Geijn.
    FLAME Working Note #47. The University of Texas at Austin, Texas Advanced Computing Center. Technical Report TR-10-02, 2010.
    [ BibTeX ]
  47. Toward Mechanical Derivation of Krylov Solver Libraries.
    Victor Eijkhout, Paolo Bientinesi, Robert van de Geijn.
    FLAME Working Note #46. The University of Texas at Austin, Texas Advanced Computing Center. Technical Report TR-10-01, 2010.
    [ BibTeX ]
  48. Formal correctness proof of mechanically derived CG methods.
    Paolo Bientinesi, Paolo Bientinesi, Margaret Myers, and Robert van de Geijn.
    FLAME Working Note #45. TACC The University of Texas at Austin, Texas Advanced Computing Center. Technical Report TR-09-06, 2009.
    [ BibTeX ]
  49. Elemental: A New Framework for Distributed Memory Dense Matrix Computations.
    Jack Poulson, Bryan Marker, Jeff R. Hammond, Nichols A. Romero, and Robert van de Geijn.
    FLAME Working Note #44. The University of Texas at Austin, Department of Computer Science. Technical Report TR-10-20. June, 2010. Revised January 2011.
    [ BibTeX ]
  50. A Run-Time System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures.
    Gregorio Quintana-Orti, Francisco D. Igual, Mercedes Marques, Enrique Quintana-Orti, and Robert van de Geijn.
    FLAME Working Note #43. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-10-10. March 31, 2010.
    [ BibTeX ]
  51. Retargeting PLAPACK to Clusters with Hardware Accelerators.
    Manuel Fogue, Francisco D. Igual, Enrique Quintana-Orti, and Robert van de Geijn.
    FLAME Working Note #42. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-10-06. February 11, 2010.
    [ BibTeX ]
  52. High-Performance Up-and-Downdating via Householder-like Transformations.
    Robert A. van de Geijn and Field G. Van Zee.
    FLAME Working Note #41. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-10-04. January 30, 2010.
    [ BibTeX ]
  53. Toward Mechanical Derivation of Krylov Solver Libraries.
    Victor Eijkhout, Paolo Bientinesi, and Robert van de Geijn.
    FLAME Working Note #40. Texas Advanced Computing Center. Technical Report TR-10-01. 2010.
    [ BibTeX ]
  54. Runtime Data Flow Scheduling of Matrix Computations.
    Ernie Chan.
    FLAME Working Note #39. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-09-22. August 10, 2009.
    [ BibTeX ]
  55. Transforming Linear Algebra Libraries: From Abstraction to Parallelism.
    Ernie Chan, Jim Nagle, Robert van de Geijn, and Field G. Van Zee.
    FLAME Working Note #38. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-09-17. May 27, 2009.
    [ BibTeX ]
  56. Level-3 BLAS on a GPU: Picking the Low Hanging Fruit
    Francisco D. Igual, Gregorio Quintana-Orti, and Robert van de Geijn.
    FLAME Working Note #37. Universidad Jaume I, Depto. de Ingenieria y Ciencia de Computadores. Technical Report DICC 2009-04-01. April 30, 2009, Updated May 21, 2009.
    [ BibTeX ]
  57. Solving ''Large'' Dense Matrix Problems on Multi-Core Processors and GPUs
    Mercedes Marques, Gregorio Quintana-Orti, Enrique S. Quintana-Orti, and Robert van de Geijn.
    FLAME Working Note #36. Universidad Jaume I, Depto. de Ingenieria y Ciencia de Computadores. Technical Report ICC 01-01-2009. Jan. 7, 2009.
    [ BibTeX ]
  58. FLAMES2S: From Abstraction to High Performance.
    Richard Veras, Jonathan Monette, Enrique Quintana-Orti, and Robert van de Geijn.
    FLAME Working Note #35. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-08-49. Dec. 14, 2008.
    [ BibTeX ]
  59. Beautiful Parallel Code: Evolution vs. Intelligent Design.
    Robert van de Geijn.
    Presented at Supercomputing 2008 Workshop on Node Level Parallelism for Large Scale Supercomputers, Austin, Texas, November 2008. FLAME Working Note #34. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-08-46. Nov. 21, 2008.
    [ BibTeX ]
  60. The Science of Deriving Stability Analyses.
    Paolo Bientinesi and Robert A. van de Geijn.
    FLAME Working Note #33. Aachen Institute for Computational Engineering Sciences, RWTH Aachen. TR AICES-2008-2. November 2008.
    [ BibTeX ]
  61. Solving Dense Linear Algebra Problems on Platforms with Multiple Hardware Accelerators.
    Gregorio Quintana-Orti, Francisco D. Igual, Enrique S. Quintana-Orti, Robert van de Geijn.
    FLAME Working Note #32. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-08-22. May 9, 2008.
    [ BibTeX ]
  62. Making Programming Synonymous with Programming for Linear Algebra Libraries.
    Maribel Castillo, Ernie Chan, Francisco D. Igual, Rafael Mayo, Enrique S. Quintana-Orti, Gregorio Quintana-Orti, Robert van de Geijn, Field G. Van Zee.
    FLAME Working Note #31. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-08-20. April 17, 2008.
    [ BibTeX ]
  63. FLAG@lab: An M-script API for Linear Algebra Operations on Graphics Processors.
    Sergio Barrachina, Maribel Castillo, Francisco D. Igual, Rafael Mayo, Enrique S. Quintana-Orti.
    FLAME Working Note #30. Universidad Jaume I, Depto. de Ingenieria y Ciencia de Computadores. Technical Report ICC 01-02-2008. February 14, 2008.
    [ BibTeX ]
  64. Programming Algorithms-by-Blocks for Matrix Computations on Multithreaded Architectures.
    Gregorio Quintana-Orti, Enrique S. Quintana-Orti, Ernie Chan, Field G. Van Zee, and Robert van de Geijn.
    FLAME Working Note #29. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-08-04. January 15, 2008.
    [ BibTeX ]
  65. On Composing Matrix Multiplication from Kernels.
    Bryan Marker.
    FLAME Working Note #28. The University of Texas at Austin, Department of Computer Sciences. Report# HR-07-32 (honors thesis). Spring 2007. 21 pages.
    [ BibTeX ]
  66. SuperMatrix for the Factorization of Band Matrices.
    Gregorio Quintana-Orti, Enrique S. Quintana-Orti, Alfredo Remon, Robert van de Geijn.
    FLAME Working Note #27. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-07-51. September 24, 2007.
    [ BibTeX ]
  67. Design and Scheduling of an Algorithm-by-Blocks for LU Factorization on Multithreaded Architectures.
    Gregorio Quintana-Orti, Enrique S. Quintana-Orti, Ernie Chan, Robert van de Geijn, Field G. Van Zee.
    FLAME Working Note #26. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-07-50. September 19, 2007.
    [ BibTeX ]
  68. SuperMatrix: A Multithreaded Runtime Scheduling System for Algorithms-by-Blocks.
    Ernie Chan, Field G. Van Zee, Paolo Bientinesi, Enrique S. Quintana-Orti, Gregorio Quintana-Orti, and Robert van de Geijn.
    FLAME Working Note #25. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-07-41. August 22, 2007.
    [ BibTeX ]
  69. Scheduling of QR factorization algorithms on SMP and multi-core architectures.
    Gregorio Quintana-Orti, Enrique S. Quintana-Orti, Ernie Chan, Field G. Van Zee, and Robert van de Geijn.
    FLAME Working Note #24. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-07-37. July 31, 2007.
    [ BibTeX ]
  70. !SuperMatrix Out-of-Order Scheduling of Matrix Operations for SMP and Multi-Core Architectures.
    Ernie Chan, Enrique S. Quintana-Orti, Gregorio Quintana-Orti, and Robert van de Geijn.
    FLAME Working Note #23. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-06-67. December 18, 2006.
    [ BibTeX ]
  71. Collective Communication: Theory, Practice, and Experience.
    Ernie Chan, Marcel Heimlich, Avijit Purkayastha, and Robert van de Geijn.
    FLAME Working Note #22. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-06-44. September 26, 2006.
    [ Source code for InterCol library] [ BibTeX ]
  72. Updating an LU Factorization with Pivoting.
    Enrique S. Quintana-Orti and Robert van de Geijn.
    FLAME Working Note #21. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2006-42.
    [ FLAME/C implementations] [ BibTeX ]
  73. High-Performance Implementation of the Level-3 BLAS.
    Kazushige Goto and Robert van de Geijn.
    FLAME Working Note #20. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2006-23.
    [ BibTeX ]
  74. Families of Algorithms Related to the Inversion of a Symmetric Positive Definite Matrix.
    Paolo Bientinesi, Brian Gunter, and Robert van de Geijn,
    FLAME Working Note #19. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2006-20.
    [ BibTeX ]
  75. Application Interface to Parallel Dense Matrix Libraries: Just let me solve my problem!
    H. Carter Edwards and Robert A. van de Geijn.
    FLAME Working Note #18. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2006-15.
    [ BibTeX ]
  76. Representing Dense Linear Algebra Algorithms: A Farewell to Indices.
    Paolo Bientinesi and Robert van de Geijn.
    FLAME Working Note #17. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2006-10.
    [ BibTeX ]
  77. FLAME 2005 Prospectus: Towards the Final Generation of Dense Linear Algebra Libraries.
    Paolo Bientinesi, Kazushige Goto, Tze Meng Low, Enrique S. Quintana-Orti, Robert van de Geijn, and Field Van Zee.
    FLAME Working Note #16. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2005-15.
    [ BibTeX ]
  78. Parallelizing FLAME Code with OpenMP Task Queues.
    Tze Meng Low, Kent Milfeld, Robert van de Geijn, and Field Van Zee.
    FLAME Working Note #15. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2004-50.
    [ BibTeX ]
  79. Improving the Performance of Reduction to Hessenberg Form.
    Gregorio Quintana-Orti and Robert van de Geijn.
    FLAME Working Note #14. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2004-44. Oct 2004.
    [ Postscript ] [ BibTeX ]
  80. On Accumulating Householder Transformations.
    Thierry Joffrain, Tze Meng Low, Enrique S. Quintana-Orti, Robert van de Geijn, and Field Van Zee.
    FLAME Working Note #13. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2004-43. Oct 2004.
    [ Postscript] [ BibTeX ]
  81. An API for Manipulating Matrices Stored by Blocks.
    Tze Meng Low and Robert van de Geijn.
    FLAME Working Note #12. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2004-15. May 2004.
    [ BibTeX ]
  82. FLAME@lab: A Farewell to Indices.
    Paolo Bientinesi, Enrique S. Quintana-Orti, and Robert van de Geijn.
    FLAME Working Note #11. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2003-11. April 2003.
    [ Postscript (gzipped)] [ BibTeX ]
  83. Representing Linear Algebra Algorithms in Code: The FLAME API.
    Robert A. van de Geijn.
    FLAME Working Note #10. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2003-01. Jan. 2003.
    [ Postscript (gzipped)] [ BibTeX ]
  84. On Reducing TLB Misses in Matrix Multiplication.
    Kazushige Goto and Robert van de Geijn.
    FLAME Working Note #9. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2002-55. Nov. 2002.
    [ Postscript (gzipped)] [ BibTeX ]
  85. The Science of Deriving Dense Linear Algebra Algorithms.
    Paolo Bientinesi, John A. Gunnels, Margaret E. Myers, Enrique S. Quintana-Orti, and Robert van de Geijn.
    FLAME Working Note #8. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2002-53. Sept. 2002.
    [ Postscript (gzipped)]
  86. Flexible High-Performance Matrix Multiply via a Self-Modifying Runtime Code.
    Greg M. Henry.
    FLAME Working Note #7. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2001-46. Dec. 2001.
    [ Postscript (gzipped)] [ BibTeX ]
  87. A Systematic Approach to the Design and Analysis of Linear Algebra Algorithms.
    John A. Gunnels.
    Ph.D. Dissertation. FLAME Working Note #6, The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2001-44. Nov. 2001.
    [ BibTeX ]
  88. Formal Derivation of Algorithms: The Triangular Sylvester Equation.
    Enrique S. Quintana-Orti and Robert van de Geijn.
    FLAME Working Note #5. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2001-35. Sept. 2001.
    [ Postscript (gzipped)] [ BibTeX ]
  89. High-Performance Matrix Multiplication Algorithms for Architectures with Hierarchical Memories.
    John Gunnels, Greg Henry, and Robert van de Geijn.
    FLAME Working Note #4. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2001-22. June 2001.
    [ Postscript (gzipped)] [ Further performance graphs related to this paper.] [ BibTeX ]
  90. Developing Linear Algebra Algorithms: A Collection of Class Projects.
    John Gunnels and Robert van de Geijn.
    FLAME Working Note #3. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2001-19. May 2001.
    [ PDF ] [ BibTeX ]
  91. Fault-Tolerant High-Performance Matrix-Matrix Multiplication,
    John A. Gunnels, Daniel S. Katz, Enrique S. Quintana-Orti, and Robert van de Geijn.
    FLAME Working Note #2. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2000-34. December 2000.
    [ Postscript ] [ BibTeX ]
  92. Formal Linear Algebra Methods Environment (FLAME): Overview.
    John Gunnels, Greg Henry, and Robert van de Geijn.
    FLAME Working Note #1. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-2000-28. November 2000.
    [ BibTeX ] [ Postscript]

Lecture Notes/Demos

These are notes Robert created for his Numerical Linear Algebra graduate course.
  1. Notes on Matrix and Vector Operations
    Robert van de Geijn
    [ PDF ]
  2. Notes on Vector and Matrix Norms
    Robert van de Geijn
    [ PDF ]
  3. Notes on the Singular Value Decomposition
    Robert van de Geijn
    [ PDF ]
  4. Notes on Gram-Schmidt QR Factorization
    Robert van de Geijn
    [ PDF ]
  5. Notes on Householder QR Factorization
    Robert van de Geijn
    [ PDF ]
  6. Notes on Linear Least-Squares
    Robert van de Geijn
    [ PDF ]
  7. Notes on the FLAME APIs
    Robert van de Geijn
    [ PDF ]
  8. Notes on Conditioning
    Robert van de Geijn
    [ PDF ]
  9. Notes on Numerical Stability
    Robert van de Geijn
    [ PDF ]
  10. Notes on LU Factorization
    Robert van de Geijn
    [ PDF ]
  11. Notes on Cholesky Factorization
    Robert van de Geijn
    [ PDF ] [ Video/Materials ]
  12. Notes on Rank-K Approximation
    Robert van de Geijn
    [ PDF ] [ lenna.m file]

Other publications


  1. A Highly-Efficient Implementation of the Doktorov Recurrence Equations for Franck-Condon Calculations
    Scott Michael Rabidoux, Victor Eijkhout, and John F. Stanton.
    Journal of chemical theory and computation, 2016.
    [ BibTeX ]
  2. A Parallelization Strategy for Large-Scale Vibronic Coupling Calculations
    Scott Michael Rabidoux, Victor Eijkhout, and John F. Stanton.
    The Journal of Physical Chemistry A}, 2015.
    [ BibTeX ]
  3. Parallelizing dense and banded linear algebra libraries using SMPSs
    Rosa M. Badia, Jose R. Herrero, Jesus Labarta, Josep M. Perez, Enrique S. Quintana-Orti and Gregorio Quintana-Orti
    Departament of Computer Architecture, Universitat Politecnica de Catalunya. Technical Report UPC-DAC-RR-2008-64. 2008.
  4. Parallel MoM using Higher-Order Basis Functions and PLAPACK In-core and Out-of-core Solvers for Challenging EM Simulations
    Y. Zhang, R. A. van de Geijn, M. C. Taylor, and T. K. Sarkar
    IEEE Antennas and Propagation Magazine, Volume 51, Issue 5, Oct. 2009 Page(s):42-60.
    [ PDF ]
  5. Parallel MoM Using Higher Order Basis Functions and PLAPACK Out-of-Core Solver for a Challenging Vivaldi Array
    Mary C. Taylor, Yu Zhang, Tapan K. Sarkar, Robert A. van de Geijn
    Antennas and Propagation Society International Symposium", 2008. AP-S 2008. IEEE 5-11 July 2008
    PDF ]

rvdg@cs.utexas.edu
Last modified: Sun Apr 9 22:51:36 CDT 2023