next up previous contents
Next: Example: Parallelizing Rank-1 Update Up: Example: Parallelizing Matrix-Vector Multiplication Previous: Simple implementation

General implementation

The above algorithm generalizes in a straight-forward manner to tex2html_wrap_inline14554 , where x and y can have any valid vector distribution, including projected and/or duplicated. Some care must be taken in creating xdup and ydup. Notice that xdup must be aligned with the columns of a (distributed like a row of A , but duplicated), while ydup must be aligned with the rows of a (distributed like a column of A , but duplicated). Since a, x, and y were all nicely aligned before, this was not an issue. Creating xdup and ydup is now accomplished through the calls

PLA_Pvector_create_conf_to( a, PLA_PROJ_ONTO_ROW, PLA_ALL_ROWS, &xdup );
PLA_Pvector_create_conf_to( a, PLA_PROJ_ONTO_COL, PLA_ALL_COLS, &ydup );
After this, all required communication and alignment is hidden in the PLA_Copy and PLA_Reduce routines. A code that generalizes even further, implementing the full functionality of the sequential tex2html_wrap_inline14564 gemv operation in given in Figure gif.

PLACE BEGIN HR HERE

figure8145

PLACE END HR HERE


next up previous contents
Next: Example: Parallelizing Rank-1 Update Up: Example: Parallelizing Matrix-Vector Multiplication Previous: Simple implementation

rvdg@cs.utexas.edu