We start by explaining the simple case of matrix-vector multiplication:
where A is an
matrix.
For our explanations, we will assume
that matrix A and vectors x and y are
created according to the driver given in Example 2.3.
Thus, the template is created with a
communicator and a blocking
of
. Given this template,
vectors x and y are aligned with the first element
of the template vector and matrix A with the upper-left
element of the template matrix. This yields objects
a, x, and y.
As explained in Section 1.5, the following steps will perform the matrix-vector multiply y = A x :
We will now show how to translate these operations into PLAPACK code.
- spread (collect) the entries of x within columns of nodes,
- perform the local matrix-vector multiply, and
- perform a distributed reduce (summation) of the local results within rows, leaving the global result in vector y .
The mechanism used by PLAPACK to communicate is to describe the initial and final distribution as objects, and perform a copy or reduce. Thus, the following statements will perform the spread of x within columns of nodes:
PLA_Obj_datatype( a, &datatype );
PLA_Pvector_create( datatype, PLA_PROJ_ONTO_ROW, PLA_ALL_ROWS,
n, template, PLA_ALIGN_FIRST, &xdup );
PLA_Copy( x, xdup );
After this, all information is available locally to perform
the local matrix-vector multiply. Before doing so, we need to create
duplicated multiscalars to hold the constants ``0'' and ``1''.
Also, a duplicated projected vector (column) must be created
to hold the result:
PLA_Mscalar_create( datatype, PLA_ALL_ROWS, PLA_ALL_COLS, 1, 1, template, &one );
PLA_Obj_set_to_one( one );
PLA_Mscalar_create( datatype, PLA_ALL_ROWS, PLA_ALL_COLS, 1, 1, template, &zero );
PLA_Obj_set_to_zero( zero );
PLA_Pvector_create( datatype, PLA_PROJ_ONTO_COL, PLA_ALL_COLS,
m, template, PLA_ALIGN_FIRST, &ydup );
PLA_Local_gemv( PLA_NO_TRANSPOSE, one, a, xdup, zero, ydup );
Finally, the local results (in the different versions of the duplicated projected vector ydup) must be reduced into a single vector y :
PLA_Obj_set_to_zero( y ); PLA_Reduce( ydup, MPI_SUM, y );