BLISRetreat Thursday Sept. 22

Section 5.2 Thursday Sept. 22

Subsection 5.2.1 Thursday 8:30 - 8:55: Coffee and muffins

Subsection 5.2.2 Thursday 8:55 - 9:00 Welcome by Robert van de Geijn

Subsection 5.2.3 Thursday 9:00 - 11:00 Session 1

Subsubsection 5.2.3.1 Exploring what is possible with BLIS

Devin Matthews
Southern Methodist University

Abstract:

BLIS is much more than just a BLAS implementation. Numerous intellectual and technical innovations within the BLIS framework make it possible to instantiate a far wider range of operations, and to (re-)combine algorithmic pieces in myriad ways without a combinatorial explosion of complexity or effort. In this talk, I discuss the nuts and bolts of how BLIS does and will continue to enable such a diverse repertoire of functionality, as well as some ideas for and potential issues in further development.

Subsubsection 5.2.3.2 AMD’s Contribution to BLIS

Meghana Vankadari, co-author Kiran Varaganti
AMD (India)

[SLIDES ¹]

Overview of various features, new API and optimizations added by AMD in the BLIS.

Subsubsection 5.2.3.3 User Mode Profiler for BLIS (DTL)

Dipal M Zambare
AMD (India)

[SLIDES ²]

Abstract

New Debug and Trace feature is added in AOCL-BLIS to perform Application profiling and debugging. This presentation explains various features and their usages.

Subsubsection 5.2.3.4 Multithreading SGEMV

Harihara Sudhan, co-author Bhaskar Nallani
AMD (India)

[SLIDES ³]

Abstract

Parallelizing Level 2 routines efficiently by predicting the optimal number of thread required based on input dimension.

Subsection 5.2.4 Thursday 11:00-11:15 Break

Subsection 5.2.5 Thursday 11:15 - 12:45 Session 2

Subsubsection 5.2.5.1 Ask me anything

Field Van Zee
The University of Texas at Austin

We can share a link to the video upon request. (Send mail to rvdg@cs.utexas.edu).

Subsubsection 5.2.5.2 Towards BLAS 3 robust solvers in LAPACK

Angelika Schwarz
Intel

[SLIDES ⁴]

Abstract:

The computation of eigenvectors and condition number estimation require the solution of triangular system which are known to be prone to floating-point overflow. To avoid overflow, LAPACK contains a set of robust BLAS-2-based solvers. The solvers use dynamic scaling to avoid the introduction of Infinity into the solution. Recently, BLAS-3 version of these solvers have been devised. This presentation gives an overview of the progress of integrating the BLAS-3 solvers into LAPACK.

Subsubsection 5.2.5.3 Libflame -no more "0 users, 0 complaints"

Robert van de Geijn
The University of Texas at Austin

[SLIDES ⁵]

Abstract:

There has been a renewed interest in a previous project of ours, which resulted in another software artifact: the libflame library. We briefly review the foundational research behind the effort, how it relates to BLIS, and the state of the library.

This talk will be short and is meant to instigate a discussion of what efforts are being undertaken outside our core team and how this can be coordinated.

Subsection 5.2.6 Thursday 12:45 - 1:45 Lunch

Subsection 5.2.7 Thursday 1:45 - 3:15 Session 3

Subsubsection 5.2.7.1 Accelerators in FLAMEs

Johannes Dieterich
AMD (Austin)

Video not available
Slides: [SLIDES ⁶]

Subsubsection 5.2.7.2 The new performance landscape of finite element methods for fluids and structures

Jed Brown
University of Colorado at Boulder

Related paper ⁷

Subsubsection 5.2.7.3 Tensor-Times-Vector: a Use-Case for "Loop-over-BLIS"

Cem Bassoy
Technical University of Hamburg

[SLIDES ⁸]

Subsection 5.2.8 Thursday 3:15- 3:30 Break

Subsection 5.2.9 Thursday 3:30 - 5:00 Session 4

Subsubsection 5.2.9.1 A case for BLIS for X

Tze Meng Low
Carnegie Mellon University

Subsubsection 5.2.9.2 Cascading GEMM

Devangi Parikh and Greg Henry
The University of Texas at Austin

[SLIDES ⁹]

Abstract

In this talk, we will discuss the opportunities for implementing higher-precision matrix-matrix multiplication (GEMM) using lower-precision high-performance GEMM. We illustrate these ideas using double-double precision (FP64x2) GEMM as an example. We leverage the BLIS framework to approximate FP64x2 GEMM accuracy which can be cast in terms of ten FP64 GEMMs by cascading the input matrices into four FP64 matrices. We show results that represent significant improvements over previous years, for both performance and accuracy, on both well-conditioned and ill-conditioned matrices.

Subsubsection 5.2.9.3 Exception handling for the BLAS and LAPACK

Weslley da Silva Pereira
(Collaborative work with Julien Langou and Jim Demmel)
University of Colorado Denver

[SLIDES ¹⁰]

Abstract:

Numerical exceptions, which may be caused by overflow, operations like division by 0 or sqrt(-1), or convergence failures, are unavoidable in many cases, in particular when software is used on unforeseen and difficult inputs. As more aspects of society become automated, e.g., self-driving cars, health monitors, and cyber-physical systems more generally, it is becoming increasingly important to design software that is resilient to exceptions, and that responds to them in a consistent way. Consistency is needed to allow users to build higher-level software that is also resilient and consistent (and so on recursively). In this talk, we explore the design space of consistent exception handling for the widely used BLAS and LAPACK linear algebra libraries, pointing out a variety of instances of inconsistent exception handling in the current versions, and propose a new design that balances consistency, complexity, ease of use, and performance. Some compromises are needed, because there are preexisting inconsistencies that are outside our control, including in or between existing vendor BLAS implementations, different programming languages, and even compilers for the same programming language. And user requests from our surveys are quite diverse. We also propose our design as a possible model for other numerical software, and welcome comments on our design choices.

slides/BLIS_Retreat_AMD_Contribution_To_BLIS_MeghanaAndKiran.pdf

slides/BLIS_Retreat_AMD_AOCL_DTL_Dipal.pdf

slides/BLIS_Retreat_AMD_Multithreading_SGEMV_HariAndBhasker.pdf

slides/Angelika_slides.pdf

slides/Robert_slides.pdf

slides/blis_retreat_Johannes.pdf

https://arxiv.org/abs/2204.01722

slides/Cem_slides.pdf

slides/Devangi_slides.pdf

slides/Weslley_slides.pdf