CSE 392/CS 378 Assignment 3

Due: October 30th, 2011

Introduction

In this assignment, you will implement Matrix-Matrix Multiplication(MMM) and try to optimize it using the techniques seen in class, mentioned in the references or any other tricks you can come up with! You can work in pairs.

Instructions

  1. Visit http://z.cs.utexas.edu/wiki/LA.wiki/OptimizingGemm/
  2. Follow the instructions under "Setup" on your favorite X86 (Intel) machine that runs Linux. It would be good to do so on a machine that has matlab installed.
  3. Try the optimizations that are suggested, following the directions.
  4. Next, try to optimize beyond what is on that wiki page. Target just one core (no OpenMP, pthreads, etc.)
  5. Some possibilities:
  6. Finally, turn in your best implementation.
  7. Turn in one file named _MMult.c with in it a routine named _MMult with a calling sequence identical to that of MY_MMult in MMult0.c. So, I would turn in RVDG_MMult.c with in it the routine "void RVDG_MMult( etc )"
  8. Your implementation needs only get the right answer for matrix sizes that are multiples of 8. So, you don't need to worry about what to do when the matrices are not nice multiples of some block size for which you optimized.
  9. The best implementations of everyone in the class will be run by Sarat on a single core of longhorn (the machine at TACC on which you have an account) and a plot will be made showing how people did.

Note

References

  1. Goto, K., and van de Geijn, R. A. 2008. Anatomy of High-Performance Matrix Multiplication, ACM Transactions on Mathematical Software 34, 3, Article 12.
  2. Chellappa, S., Franchetti, F., and Pü, M. 2008. How To Write Fast Numerical Code: A Small Introduction, Lecture Notes in Computer Science 5235, 196.259.
  3. Bilmes, et al. The PHiPAC (Portable High Performance ANSI C) Page for BLAS3 Compatible Fast Matrix Matrix Multiply.
  4. Lam, M. S., Rothberg, E. E, and Wolf, M. E. 1991.The Cache Performance and Optimization of Blocked Algorithms, ASPLOS'91, 63.74.
  5. Is Search Really Necessary to Generate High-Performance BLAS?

Submission

You should submit your code( a single .c file) as described in the Instructions section and a pdf document(about half a page in length) describing the optimizations that you used. Create a zip file and email it to saratrallapalli@gmail.com before 11.59 P.M. on 30th October. Subject of the email should be [hpc-hw3] Last Name/ First Name