Next: Forming Up: Example: Parallelizing Matrix-Matrix Multiplication Previous: Forming

## Forming

For this operation to be well-defined, we require matrices, A , B , and C to have dimensions , , and , respectively.

If we partition

then

for . Thus, a matrix-matrix multiply can be written as a sequence of matrix-vector multiplies

### Basic Parallel Algorithm:

We can implement the above sequence of matrix-vector multiplies by using views, splitting off one column and row at a time: Let C and B be partitioned like

where and equal the first column of C and first row of B . Then

So the computation of C can be viewed matrix-vector multiply of A times the first row of B , after which the operation can be performed, using the same approach, recursively.

This leads to the following algorithm:

• Scale
• Let and
• Do until done
• If and are empty, the loop is done.
• Partition

• Compute
• Let and
Again, the `` '' symbol is used to indicate assignment, while ``='' indicates a reference or partitioning.

### Blocked Algorithm:

In order to get better performance, it is advantageous to set up the algorithm to perform a sequence of matrix-panel-of-vector operations: Let C and B be partitioned like

where and now equal a panel of s columns. Then

So the computation of C can be viewed as a matrix-times-panel-of-vectors multiply ( A times the first panel of rows of B ), followed by a recursive operation using the same approach.

This leads to the following algorithm:

• Scale
• Let and
• Do until done
• Determine width s of split. If and are empty, the loop is done.
• Partition

and

• Compute
• Let and
Again, the `` '' symbol is used to indicate assignment, while ``='' indicates a reference or partitioning. The resulting code is given in Fig. . We will often refer to this approach to parallel matrix-matrix multiplication as a     matrix-panel variant, to indicate that the basic operation uses matrix-panel-of-vectors (rows or columns) multiplication.

PLACE BEGIN HR HERE

PLACE END HR HERE

### Pipelined Algorithm:

For large n , the above algorithm can be improved by introducing pipelining. This can be accomplished by setting the natural flow of computation to be to the right or left, for communication within rows, or to the top or bottom, for communication within columns.

Next: Forming Up: Example: Parallelizing Matrix-Matrix Multiplication Previous: Forming

rvdg@cs.utexas.edu