CS 378: Homework 4

Due: April 24th, 2008

This assignment is optional. You should do it only if you want to make up for a bad grade in a previous assignment.

Introduction

The goal of this assignment is to implement dense matrix-vector multiplication (MVM) on Lonestar in two different ways, and measure the performance of both implementations.

In this write-up, we will refer to the MVM as A*x where A is the matrix and x is the vector. You can assume that the matrix A is square and is of size n x n. Recall from the class discussion that we usually do not perform a single MVM in isolation - rather, we usually want to perform a large number of MVMs with the same matrix A and different vectors x1, x2,...where the vector x2 is obtained from the vector A*x1, the vector x3 is obtained from the vector A*x2 etc. Therefore, our MVM routine will be optimized for this case: in particular, we will partition matrix A between the processors just once, and also require that the distribution of the vector A*x be the same as the distribution of the vector x.

One approach to building the distributed matrix is to create the entire matrix on the root process, and then have the root processor send appropriate portions to other processes. This approach limits the size of the matrix you can work with, so a better approach is to have each process create its own portion of the global matrix. In a real code, this could be accomplished by having each process read in its own portion of the global matrix from disk, or by having some other program like the finite-element mesh generator and formulator create the appropriate portion of the global matrix in the local memories of each processor. We will do something simpler - the root process should broadcast the size of the global matrix to other processes, and the other processes allocate a local sub-matrix of the appropriate size, initializing the entries as follows: A(i,j) = (i+j). Each process should create its own piece of vector x as well: initialize all elements of the vector to 1. Both the matrix and vector should contain doubles.

Problem 1 -- Block-row distribution of matrix

Write an MPI program for MVM in which the matrix A is distributed by block row, and the vector x has a matching block distribution.

Problem 2 -- Performance of compiler-generated code

Repeat Problem 1 for a block/block distribution of the matrix A and the corresponding distribution of the vector x.

Problem 3 -- Conclusions


Answer the following question.