Due: April 24th, 2008
This assignment is optional. You should do it only if you want to
make up for a bad grade in a previous assignment.
The goal of this assignment is to implement dense matrix-vector
multiplication
(MVM) on Lonestar in two different ways, and measure the performance of
both
implementations.
In this write-up, we will refer to the MVM as A*x where A
is the matrix and x is the vector. You can assume that the
matrix
A is square and is of size n x n. Recall from the
class
discussion that we usually do not perform a single MVM in isolation -
rather,
we usually want to perform a large number of MVMs with the same matrix A
and different vectors x1, x2,...where the vector x2
is
obtained from the vector A*x1, the vector x3 is
obtained from
the vector A*x2 etc. Therefore, our MVM routine will be
optimized
for this case: in particular, we will partition matrix A between the
processors
just once, and also require that the distribution of the vector A*x
be
the same as the distribution of the vector x.
One approach to building the distributed matrix is to create the
entire
matrix on the root process, and then have the root processor send
appropriate
portions to other processes. This approach limits the size of the
matrix
you can work with, so a better approach is to have each process create
its
own portion of the global matrix. In a real code, this could be
accomplished by having each process read in its own portion of the
global matrix from disk,
or by having some other program like the finite-element mesh generator
and
formulator create the appropriate portion of the global matrix in the
local
memories of each processor. We will do something simpler - the root
process
should broadcast the size of the global matrix to other processes, and
the
other processes allocate a local sub-matrix of the appropriate size,
initializing
the entries as follows: A(i,j) =
(i+j). Each process should create its own
piece of vector x as well: initialize all elements of the
vector to 1. Both the matrix and vector should contain doubles.
Write an MPI program for MVM in which the matrix A is
distributed
by block row, and the vector x has a matching block
distribution.
Repeat Problem 1 for a block/block distribution of the matrix A and
the
corresponding distribution of the vector x.