BLIS Retreat 2016

Program (First Draft)

Contributed talks

Monday Sept. 19

Morning	POB 2.402
8:30-9:00	Breakfast (POB 2.402)
9:00-9:30	BLIS: Year In Review, 2015-2016	SLIDES	Field Van Zee, UT-Austin
9:30-10:10	Implementing Strassen-like Fast Matrix Multiplication Algorithms with BLIS	SLIDES	Jianyu Huang and Leslie Rice, UT-Austin
10:10-10:40	A New I/O Lower Bound for GEMM with a Tight Constant	For SLIDES contact speaker	Tyler Smith, UT-Austin
10:30-11:00	Coffee (POB 2.402)
11:00-11:30	Scalable Dense Matrix Multiplication on Multi-Socket Many-Core Systems with Fast Shared Memory	SLIDES	Natalia Vassilieva, Hewlett Packard Labs
11:30-12:00	An Implementation of GEMM for DMA-enabled Architectures	SLIDES	Devangi Parikh, TI
12:00-12:30	BLAS for Deep Learning: tuple, mixed-precision, fixed-point, and binary GEMM	SLIDES	Marat Dukhan, GATech

12:30 - 2 Lunch (GDC 6.302 - Computer Science Faculty Lounge)

Afternoon	POB 2.402
2:00-2:30	A Case Study for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization	SLIDES	Sandra Catalan, Univ. Jaume I
2:30-3:00	libFLAME Optimizations with BLIS	SLIDES	Kiran Varaganti, AMD
3:00-3:30	Extended BLAS, Integer BLAS, Batched BLAS, etc.		Greg Henry, Intel
3:30-4:00	Coffee (POB 2.402)
4:00-4:30	An Algorithmic Specific Code Generator for Matrix-Matrix Multiply-Like Operations	SLIDES	Richard Veras, CMU
4:30-5:00	A BLIS affair with FPGAs	For VIDEO contact speaker SLIDES	Tze Meng Low
5:00-5:30	Cl1ck + LGen: FLAME for small scale linear algebra	SLIDES	Diego Fabregat, RWTH-Aachen University

Tuesday Sept. 20

Morning	POB 2.402
8:30-9:00	Breakfast (POB 2.402)
9:00-9:30	Tensor Contraction with BLIS	SLIDES	Devin Matthews, UT
9:30-10:00	Design of a high-performance GEMM-like Tensor-Tensor Multiplication	SLIDES	Paul Springer, RWTH-Aachen University
10:00-10:30	Using BLIS for tensor computations in Q-Chem	SLIDES	Evgeny Epifanovsky, Q-Chem
10:30-11:00	Coffee (POB 2.402)
11:00-11:30	PeachPy.io: a platform for crowdsourcing performance tuning	SLIDES	Marat Dukhan, GATech
11:30-12:00	Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era	SLIDES	Ardavan Pedram, Movidius and Stanford
12:00-12:30	A set of high performance kernel matrix operations on CPU, KNL and GPU	SLIDES	Chenhan Yu, UT-Austin
12:30-12:45	Closing comments		Robert van de Geijn, UT-Austin