BLISRetreat Monday Aug. 28

Section 5.2 Monday Aug. 28

Subsection 5.2.1 Monday 8:30 - 8:55: Coffee and muffins

Subsection 5.2.2 Monday 8:55 - 9:00 Welcome by Robert van de Geijn

Subsection 5.2.3 Monday 9:00 - 10:45 Session 1

Subsubsection 5.2.3.1 BLIS V: The Final Frontier

Devin Matthews
Southern Methodist University

Abstract:

BLIS makes BLAS better...more expressive and flexible interfaces, rapid instantiation on new architectures, well-documented and clean codebase...there's a lot about BLIS to love. But, BLIS has been evolving. What is and should BLIS be? Where is BLIS going in the future? Will BLIS ever be "done"? I will discuss these and similar questions, centering around work funded by the NSF over the past 3 years to expand the range of BLIS functionality.

Related materials

Slides ¹

Subsubsection 5.2.3.2 L1 and L2 API Optimizations

Harihara Sudhan
AMD India

Subsubsection 5.2.3.3 Performance improvements of NRM2

Eleni Vlachopoulou
AMD UK

Related materials

Slides ²

Subsubsection 5.2.3.4 Additional Discussion

Subsection 5.2.4 Monday 10:45 - 11:00 Break

Subsection 5.2.5 Monday 11:00 - 12:30 Session 2

Subsubsection 5.2.5.1 The CLAG Framework: The Arm Performance Libraries approach to implementing BLAS

Joe Dobson
Arm UK

Abstract:

An overview of the design and implementation decisions behind Arm Performance Libraries' BLAS framework.

Subsubsection 5.2.5.2 Updates on Practical Strassen's Algorithms

Rodrigo Brandao
UT Austin

Collaborative work with Devangi Parikh

Abstract:

In this talk we will discuss a practical implementation of Strassen's and other Fast-Matrix Multiplication (FMM) Algorithms. With the recent interest in of discovering faster matrix multiplication algorithms using reinforcement learning, we investigate to see if these new algorithms have a practical benefit.

Related materials

Slides ³

Subsubsection 5.2.5.3 Updates on casting higher precision in lower precision

Greg Henry
UT Austin

Collaborative work with Devangi Parikh.

Subsubsection 5.2.5.4 Additional Discussion

Subsection 5.2.6 Monday 12:30 - 1:30 Lunch

Subsection 5.2.7 Monday 1:30 - 3:00 Session 3

Subsubsection 5.2.7.1 Status of acceleration with libflame and BLIS

Johannes Dieterich
AMD Austin

Related materials not available. You may want to contact the speaker with questions.

Subsubsection 5.2.7.2 A Generalized Micro-kernel Abstraction for GPU Linear Algebra

Vijay Thakkar
NVIDIA and Georgia Tech

Collaborative work with Cris Cecka

Related software: https://github.com/nvidia/cutlass.
Slides ⁵

Subsubsection 5.2.7.3 An introduction to the SMaLL Framework for ML libraries

Upasana Sridhar
CMU

Abstract:

We describe the SMaLL framework, a framework for rapidly developing high performance ML libraries for CPU-based platforms. We adopt a similar approach to BLIS by restricting the design effort to only a small set of kernels via a standard loop nest bodies. This allow us to target new hardware rapidly and avoids the overheads associated with translating ML primitives to linear algebra.

Related materials

For recording, please contact the speaker.
Slides ⁶

Subsubsection 5.2.7.4 Additional Discussion

Subsection 5.2.8 Monday 3:00-3:15 Break

Subsection 5.2.9 Monday 3:15 - 5:00 Session 4

Subsubsection 5.2.9.1 Code Generation for BLIS/BLAS via Exo

Grace Dinh
UC Berkeley

Subsubsection 5.2.9.2 RandBLAS, An aspiring standard library and why it matters

Kaiwen He
Purdue University

Related materials

Slides ⁷
Paper ⁸

Subsubsection 5.2.9.3 Auto-generated GEMM kernels for RISC-V RVV

Stepan Nassyr
Jülich Supercomputing Centre

Related materials

Slides ⁹

Subsubsection 5.2.9.4 Ask me anything

Field Van Zee
UT Austin

Subsubsection 5.2.9.5 Additional Discussion

slides/Devin_blisretreat2023.pdf

slides/BLIS_retreat_nrm2.pdf

slides/Rodrigo_BLISRetreat2023.pdf

https://arxiv.org/abs/2303.04353

slides/Thakkar_BLISRetreat2023.pdf

slides/Upasala_BLISRetreat2023.pdf

slides/RandLAPACK_presentations.pdf

https://arxiv.org/pdf/2302.11474.pdf

slides/Generating_GEMM_for_RISC_V_RVV.pdf