CS378: High-Performance Parallel Computing
Assignment 1: Optimize matrix-matrix multiplication
-
Files
-
Instructions:
-
Download and run the basic code:
-
Log on to an Intel Pentium (R) III based machine.
-
Create a subdirectory for all of your assignments.
-
Download the files assign1.tar.gz and ATLAS.tar.gz
-
Uncompress and untar the two files assign1.tar.gz and ATLAS.tar.gz:
-
gunzip *.gz
-
tar -xf assign1_tar
-
tar -xf ATLAS_tar
Subdirectory ATLAS will now contain a set of Basic Linear
Algebra Subprograms (BLAS) for the Pentium III.
Subdirectory assign1 will now contain a number of files
xsassociated with matrix-matrix multiplication.
-
cd assign1
-
Modify file Makefile to set the path to where
the ATLAS directory is (first line of file).
-
Type "make". This will create a file test_gemm.x
-
Type "./test_gemm.x". This will execute a driver routine for
timing the matrix-matrix multiply.
-
Enter the number of repeats. The time for the matrix-matrix
multiply will be the shortest time measured over the entered
number of repeats.
-
Enter the range of problems to be timed.
"50 500 50" will time all matrix problems
50x50 to 500x500 in steps of 50.
-
Enter "-1" to quit the execution.
-
Start up a MATLAB sessions by typing "matlab"
(if matlab is installed).
-
Cut and paste the output from test_gemm.x into MATLAB to
create a graph.
-
Optimize the matrix-matrix multiplication in
my_gemm.c .
This is the routine that is reported as "ME" in the graph.
Some possible optimizations:
-
Rewrite in terms of dot-products and use the
CBLAS (C interface to the Basic Linear Algebra Subprograms)
routine cblas_ddot.
-
Rewrite in terms of "axpy" operations and use the
CBLAS (C interface to the Basic Linear Algebra Subprograms)
routine cblas_daxpy.
-
Rewrite in terms of matrix-vector multiplication
and use the CBLAS routine cblas_dgemv.
-
No, you are not allowed to simply make it a call
to the CBLAS routine cblas_dgemm, which implements
the matrix-matrix multiply.
-
You may wish to read