Learning optimal tile sizes using neural networks

Project contacts: Roshan Dathathri

Project description: Affine loop nests are arbitrarily nested loop nests in which the array accesses and the loop bounds are affine functions (linear functions with a constant offset) of the program parameters. They form the compute-intensive core of scientific computations like linear-algebra kernels and stencil-style computations. Compilers (using the polyhedral model) can statically analyze these loop nests and generate parallel and tiled code without programmer intervention. However, the performance of the generated code is highly sensitive to the tile size, which is dependent on the machine architeture (but not the problem sizes). So, the programmer has to choose the right tile sizes to get the best performance. Exhaustively searching for the best tile sizes (auto-tuning) is time consuming.

Analytical models have been proposed to compute the best tile sizes. These models are based on analyzing both machine architecture features and program features meticuously to find an analytical function for the optimal tile sizes in terms of the values of the features. Yotov et al. in PLDI 2003 proposed an analytical model for determining optimal tile sizes for matrix multiplication. They showed that their analytical model can do almost as well as exhaustive search. However, their analytical model is specific to one problem - matrix multiplication. Extending it to other affine loop nests would require analyzing each of the problems independently.

Recent work has tried to use machine learning techniques to learn optimal tile sizes automatically. Cummins et al. in ADAPT 2016 use classifiers and regressors to learn tile sizes for stencil GPU kernels. The advantage is that the best tile sizes for different stencil problems can be learned automatically. Neural networks seem well suited to learn tile sizes because the analytical functions for optimal tile sizes are usually not linear functions.

The goal of this project is to build a machine learning system - possibly based on deep neural networks - that takes affine loop nest features and architecture features as inputs and predicts the best tile size for that loop nest. To begin with, you can restrict yourself to perfectly nested loops. Here are things to think about.
Project deliverables and deadlines:
  1. (Nov 1) A clear statement in English describing your project proposal.
  2. (Nov 8) A survey of analytical models, polyhedral compilers, and neural networks.
  3. (Dec 6) A tool that takes loop nest features and machine features as input and outputs the tile size to use in the code for that machine.
  4. (Dec 6) A project report, written like an ACM conference, that summarizes the work you did.

Papers:

  1. Is Search Really Necessary to Generate High-Performance BLAS? Kamen Yotov, Xiaoming Li, Gang Ren, Maria Garzaran, David Padua, Keshav Pingali, Paul Stodghill. PLDI 2003.
  2. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer. Uday Bondhugula, A. Hartono, J. Ramanujan, P. Sadayappan. ACM SIGPLAN Programming Languages Design and Implementation (PLDI), Jun 2008, Tucson, Arizona.
  3. Autotuning OpenCL Workgroup Size for Stencil Patterns. Chris Cummins, Pavlos Petoumenos, Michel Steuwer, Hugh Leather. In Proceedings of the 6th International Workshop on Adaptive Self-tuning Computing Systems (ADAPT'16).
  4. Introduction to Neural Networks. Yaser Abu-Mostafa. An online course lecture featured on edX.