SUBROUTINE PBDGEMM( ICONTXT, MATPOS, TRANSA, TRANSB, M, N, K, MB, $ NB, KB, ALPHA, A, LDA, B, LDB, BETA, C, LDC, $ IAROW, IACOL, IBROW, IBCOL, ICROW, ICCOL, $ BR1ST, ACOMM, AWORK, BCOMM, BWORK, CWORK, $ WORK ) * * -- PB-BLAS routine (version 2.1) -- * University of Tennessee, Knoxville, Oak Ridge National Laboratory. * April 28, 1996 * * Jaeyoung Choi, Oak Ridge National Laboratory * Jack Dongarra, University of Tennessee and Oak Ridge National Lab. * David Walker, Oak Ridge National Laboratory * * .. Scalar Arguments .. CHARACTER*1 ACOMM, AWORK, BCOMM, BR1ST, BWORK, CWORK, $ MATPOS, TRANSA, TRANSB INTEGER IACOL, IAROW, IBCOL, IBROW, ICCOL, ICONTXT, $ ICROW, K, KB, LDA, LDB, LDC, M, MB, N, NB DOUBLE PRECISION ALPHA, BETA * .. * .. Array Arguments .. DOUBLE PRECISION A( LDA, * ), B( LDB, * ), C( LDC, * ), $ WORK( * ) * .. * * Purpose * ======= * * PBDGEMM is a parallel blocked version of DGEMM. * PBDGEMM performs one of the matrix-matrix operations based on block * cyclic distribution. * * C := alpha*op( A )*op( B ) + beta*C, * * where op( X ) is one of * * op( X ) = X or op( X ) = X', * * alpha and beta are scalars, and A, B and C are matrices, with op( A ) * an m-by-k matrix, op( B ) a k-by-n matrix and C an m-by-n matrix. * One of m, n, k is limited to its block size. So two of A, B, C are * block column or block row and the other is a full block matrix. * * A and/or B can be broadcast, or C is collected if necessary, and its * communication scheme can be selected. * * The first elements of the matrices A, B, and C should be located at * the beginnings of their first blocks. (not the middle of the blocks.) * * Matrices should be aligned naturally. The first blocks of the * matrices should be located at the same column or row of a process * template if matrix transposition is not required duing the operation. * * Parameters * ========== * * ICONTXT (input) INTEGER * ICONTXT is the BLACS mechanism for partitioning communication * space. A defining property of a context is that a message in * a context cannot be sent or received in another context. The * BLACS context includes the definition of a grid, and each * process' coordinates in it. * * MATPOS (input) CHARACTER*1 * MATPOS specifies position of a matrix, which has to be A, B, * or C. * * MATPOS = 'A': A is a matrix * MATPOS = 'B': B is a matrix * MATPOS = 'C': C is a matrix * * TRANSA (input) CHARACTER*1 * TRANSA specifies the form of op( A ) to be used in the matrix * multiplication as follows: * * TRANSA = 'N': op( A ) = A. * TRANSA = 'T': op( A ) = A'. * TRANSA = 'C': op( A ) = A'. * * TRANSB (input) CHARACTER*1 * TRANSB specifies the form of op( B ) to be used in the matrix * multiplication as follows: * * TRANSB = 'N': op( B ) = B. * TRANSB = 'T': op( B ) = B'. * TRANSB = 'C': op( B ) = B'. * * M (input) INTEGER * M specifies the (global) number of rows of the matrix op( A ) * and of the matrix C. M >= 0. * * N (input) INTEGER * N specifies the (global) number of columns of the matrix * op( B ) and the (global) number of columns of the matrix * C. N >= 0. * * K (input) INTEGER * K specifies the (global) number of columns of the matrix * op( A ) and the (global) number of rows of the matrix * op( B ). K >= 0. * * MB (input) INTEGER * MB specifies the row block size of the matrices op( A ) and * C. MB >= 1. * * NB (input) INTEGER * NB specifies the column block size of the matrices op( B ) * and C. NB >= 1. * * KB (input) INTEGER * KB specifies the column block size of the matrix op( A ) * and row block size of the matrix op( B ). KB >= 1. * * ALPHA (input) DOUBLE PRECISION * ALPHA specifies the scalar alpha. * * A (input) DOUBLE PRECISION array of DIMENSION ( LDA, Lx ), * where Lx >= Kq when TRANSA = 'N' and Lx >= Mq otherwise, * see parameter details for the values of Mq and Kq. Before * entry with TRANSA = 'N', the leading Mp-by-Kq part of * the array A must contain the matrix A, otherwise the leading * Kp-by-Mq part of the array A must contain the matrix A. * * LDA (input) INTEGER * LDA specifies the first dimension of (local) A as * declared in the calling (sub) program. When TRANSA = 'N', * LDA >= MAX(1,Mp), otherwise LDA >= MAX(1,Kp). * * B (input) DOUBLE PRECISION array of DIMENSION ( LDB, Lx ), * where Lx is Nq when TRANSB = 'N' or 'n' and is Kq otherwise, * see parameter details for the values of Nx and Kx. Before * entry with TRANSB = 'N' or 'n', the leading Kp-by-Nq part of * the array B must contain the matrix B, otherwise the leading * Np-by-Kq part of the array B must contain the matrix B. * * LDB (input) INTEGER * LDB specifies the first dimension of (local) B as declared * in the calling (sub) program. When TRANSB = 'N', * LDB >= MAX(1,Kp), otherwise LDB >= MAX(1,Mq). * * BETA (input) DOUBLE PRECISION * On entry, BETA specifies the scalar beta. When BETA is * supplied as zero then C need not be set on input. * * C (input/output) DOUBLE PRECISION array of DIMENSION (LDC,Nq). * Before entry, the leading Mp-by-Nq part of the array C must * contain the matrix C, except when beta is zero, in which * case C need not be set on entry. On exit, the array C is * overwritten by the Mp-by-Nq matrix ( alpha*op( A )*op( B ) + * beta*C ). * Input values of C would be changed after the computation in * the processes which don't have the resultant column block * or row block of C if MATPOS <> 'C' and CWORK = 'Y'. * * LDC (input) INTEGER * LDC specifies the first dimension of (local) C as declared * in the calling (sub) program. LDC >= MAX(1,Mp). * * IAROW (input) INTEGER * It specifies the current row of process template which * has the first block of A. If A is a row block and all rows * of processes have their own copies of A, set IAROW = -1. * * IACOL (input) INTEGER * It specifies the current column of process template which * has the first block of A. If A is a column block and all * columns of processes have their own copies of A, set IACOL * = -1. * * IBROW (input) INTEGER * It specifies the current row of process template which has * the first block of B. If B is a row block and all rows of * processes have their own copies of B, set IBROW = -1. * * IBCOL (input) INTEGER * It specifies the current column of process template which * has the first block of B. If B is a column block and all * columns of processes have their own copies of B, set IBCOL * -1. * * ICROW (input) INTEGER * It specifies the current row of process template which has * the first block of C. * * ICCOL (input) INTEGER * It specifies the current column of process template which * has the first block of C. * * BR1ST (input) CHARACTER*1 * BR1ST determines which matrix (actually they are either * column blocks or row blocks) needs to be broadcast first, * A or B, when MATPOS = 'C', IAPOS >= 0, and IBPOS >= 0. * Otherwise, it is ignored. * * BR1ST = 'A': A is broadcast first, * * BR1ST = 'B': B is broadcast first. * * ACOMM (input) CHARACTER*1 * ACOMM specifies the communication scheme of row or column * block of A. It follows topology definition of BLACS. * * AWORK (input) CHARACTER*1 * AWORK determines whether A is a workspace or not. * * AWORK = 'Y': A is workspace in other processes. * A is sent to A position in other processes. * It is assumed that processes have * sufficient space to store (local) A. * AWORK = 'N': Data in A will be untouched (unchanged). * * If transposition of A is involved with the computation of C, * the argument is ignored. * * BCOMM (input) CHARACTER*1 * BCOMM specifies the communication scheme of row or column * block of B. It follows topology definition of BLACS. * * BWORK (input) CHARACTER*1 * BWORK determines whether B is a workspace or not. * * BWORK = 'Y': B is workspace in other processes. * B is sent to B position in other processes. * It is assumed that processes have * sufficient space to store (local) B. * BWORK = 'N': Data in B will be untouched (unchanged). * * If transposition of B is involved with the computation of C, * the argument is ignored. * * CWORK (input) CHARACTER*1 * CWORK determines whether C is a workspace or not. * * CWORK = 'Y': C is workspace in other processes. * It is assumed that processes have * sufficient space to store temporary * (local) C. * CWORK = 'N': Data in C will be untouched in other * processes. * * If MATPOS = 'C', or transposition of C is computed during * computation, the argument is ignored. * * WORK (workspace) DOUBLE PRECISION array of dimension Size(WORK). * It will store copies of A, B, and/or C. * * Parameters Details * ================== * * Lx It is a local portion of L owned by a process, (L is * replaced by M, N, or K, and x is replaced by either p * (=NPROW) or q (=NPCOL)). The value is determined by L, LB, * x, and MI, where LB is a block size and MI is a row or * column position in a process template. Lx is equal to or * less than Lx0 = CEIL( L, LB*x ) * LB. * * Memory Requirement of WORK * ========================== * * Mpb = CEIL( M, MB*NPROW ) * Nqb = CEIL( N, NB*NPCOL ) * Kqb = CEIL( K, KB*NPCOL ) * Mp0 = NUMROC( M, MB, 0, 0, NPROW ) ~= Mpb * MB * Nq0 = NUMROC( N, NB, 0, 0, NPCOL ) ~= Nqb * NB * Kq0 = NUMROC( K, KB, 0, 0, NPCOL ) ~= Kqb * KB * LCMQ = LCM / NPCOL * LCMP = LCM / NPROW * * (1) TRANSA = 'N' and TRANSB = 'N' * a) MATPOS = 'C' * Size(WORK) = K * Mp0 (if IACOL <> -1 and AWORK <> 'Y') * + K * Nq0 (if IBROW <> -1 and BWORK <> 'Y') * b) MATPOS = 'A' * Size(WORK) = N * Kq0 * + MAX[ N*CEIL(Kqb,LCMQ)*KB (if IBCOL <> -1), * N*CEIL(Kqb,LCMQ)*KB*MIN(LCMQ,CEIL(K,KB)) * (if IBCOL = -1), * N*Mp0 (if CWORK <> 'Y') ] * c) MATPOS = 'B' * Size(WORK) = M * Kp0 * + MAX[ M*CEIL(Kpb,LCMP)*KB (if IAROW <> -1), * M*CEIL(Kpb,LCMP)*KB*MIN(LCMP,CEIL(K,KB)) * (if IAROW = -1), * M*Nq0 (if CWORK <> 'Y') ] * * (2) TRANSA = 'T'/'C' and TRANSB = 'N' * a) MATPOS = 'C', * i) BR1ST = 'A' * Size(WORK) = K * Mp0 * + MAX[ K*CEIL(Mpb,LCMP)*MB (if IAROW <> -1), * K*CEIL(Mpb,LCMP)*MB*MIN(LCMP,CEIL(M,MB)) * (if IAROW = -1), * K*Nq0 (if IBROW <> -1 and BWORK <> 'Y') ] * ii) BR1ST = 'B' * Size(WORK) = K * Mp0 * + K * Nq0 (if IBROW <> -1 and BWORK <> 'Y') * + MAX[ K*CEIL(Mpb,LCMP)*MB (if IAROW <> -1), * K*CEIL(Mpb,LCMP)*MB*MIN(LCMP,CEIL(M,MB)) * (if IAROW = -1) ] * b) MATPOS = 'A' * Size(WORK) = N * Mq0 * + MAX[ N*CEIL(Mpb,LCMP)*MB, * N*Kp0 (if IBCOL <> -1 and BWORK <> 'Y') ] * c) MATPOS = 'B' * Size(WORK) = M * Kp0 (if IACOL <> -1 and AWORK <> 'Y') * + M * Nq0 (if CWORK <> 'Y') * * (3) TRANSA = 'N' and TRANSB = 'T'/'C' * a) MATPOS = 'C' * i) BR1ST = 'A' * Size(WORK) = K * Nq0 * + K * Mp0 (if IACOL <> -1 and AWORK <> 'Y') * + MAX[ K*CEIL(Nqb,LCMQ)*NB (if IBCOL <> -1), * K*CEIL(Nqb,LCMQ)*NB*MIN(LCMQ,CEIL(N,NB)) * (if IBCOL = -1) ] * ii) BR1ST = 'B' * Size(WORK) = K * Nq0 * + MAX[ K*CEIL(Nqb,LCMQ)*NB (if IBCOL <> -1), * K*CEIL(Nqb,LCMQ)*NB*MIN(LCMQ,CEIL(N,NB)) * (if IBCOL = -1), * K*Mp0 (if IACOL <> -1 and AWORK <> 'Y') ] * b) MATPOS = 'A' * Size(WORK) = N * Kq0 (if IBROW <> -1 and BWORK <> 'Y') * + N * Mp0 (if CWORK <> 'Y') * c) MATPOS = 'B' * Size(WORK) = M * Np0 * + MAX[ M*CEIL(Nqb,LCMQ)*NB, * M*Kq0 (if IAROW <> -1 and AWORK <> 'Y') ] * * (4) TRANSA = 'T'/'C' and TRANSB = 'T'/'C' * a) MATPOS = 'C' * Size(WORK) = K * Mp0 + K * Nq0 * + MAX[ K*CEIL(Mpb,LCMP)*MB (if IAROW <> -1), * K*CEIL(Mpb,LCMP)*MB*LCMP (if IAROW = -1), * K*CEIL(Nqb,LCMQ)*NB (if IBCOL <> -1), * K*CEIL(Nqb,LCMQ)*NB*LCMQ (if IBCOL = -1) ] * b) MATPOS = 'A' * Size(WORK) = N * Mq0 * + MAX[ N*CEIL(Mpb,LCMP)*MB, * N*Kp0+N*CEIL(Kpb,LCMP)*KB (if IBROW <> -1), * N*Kp0+N*CEIL(Kpb,LCMP)*KB*MIN(LCMP,CEIL(K,KB)) * (if IBROW = -1) ] * c) MATPOS = 'B' * Size(WORK) = M * Np0 * + MAX[ M*CEIL(Nqb,LCMQ)*NB, * M*Kq0+M*CEIL(Kqb,LCMQ)*KB (if IACOL <> -1), * M*Kq0+M*CEIL(Kqb,LCMQ)*KB*MIN(LCMQ,CEIL(K,KB)) * (if IACOL = -1) ] * Notes * ----- * More precise space can be computed as * * CEIL(Mpb,LCMP)*MB => NUMROC( NUMROC(M,MB,0,0,NPROW), MB, 0, 0, LCMP ) * = NUMROC( Mp0, MB, 0, 0, LCMP ) * CEIL(Nqb,LCMQ)*NB => NUMROC( NUMROC(N,NB,0,0,NPCOL), NB, 0, 0, LCMQ ) * = NUMROC( Nq0, NB, 0, 0, LCMQ ) * * ===================================================================== * * .. * .. Parameters .. DOUBLE PRECISION ONE, ZERO PARAMETER ( ONE = 1.0D+0, ZERO = 0.0D+0 ) * .. * .. Local Scalars .. CHARACTER*1 COMMA, COMMB LOGICAL ADATA, AMAT, ASPACE, BDATA, BSPACE, CMAT, $ CSPACE, NOTA, NOTB INTEGER INFO, IPA, IPB, IPC, IPT, KP, KQ, MP, MQ, $ MYCOL, MYROW, NP, NPCOL, NPROW, NQ DOUBLE PRECISION TBETA, DUMMY * .. * .. External Functions .. LOGICAL LSAME INTEGER NUMROC EXTERNAL LSAME, NUMROC * .. * .. External Subroutines .. EXTERNAL BLACS_GRIDINFO, DGEBR2D, DGEBS2D, DGEMM, $ DGSUM2D, PBDMATADD, PBDTRAN, PXERBLA * .. * .. Intrinsic Functions .. INTRINSIC MAX * .. * .. Executable Statements .. * * Quick return if possible. * IF( ( M.EQ.0 ).OR.( N.EQ.0 ).OR. $ ( ( ( ALPHA.EQ.ZERO ).OR.( K.EQ.0 ) ).AND.( BETA.EQ.ONE ) ) ) $ RETURN * CALL BLACS_GRIDINFO( ICONTXT, NPROW, NPCOL, MYROW, MYCOL ) * * Set NOTA and NOTB as true if A and B respectively are not * transposed. * NOTA = LSAME( TRANSA, 'N' ) NOTB = LSAME( TRANSB, 'N' ) CMAT = LSAME( MATPOS, 'C' ) AMAT = LSAME( MATPOS, 'A' ) * * Test the input parameters. * INFO = 0 IF( ( .NOT.CMAT ) .AND. ( .NOT.AMAT ) .AND. $ ( .NOT.LSAME( MATPOS, 'B' ) ) ) THEN INFO = 2 ELSE IF( ( .NOT.NOTA ).AND. $ ( .NOT.LSAME( TRANSA, 'C' ) ).AND. $ ( .NOT.LSAME( TRANSA, 'T' ) ) ) THEN INFO = 3 ELSE IF( ( .NOT.NOTB ).AND. $ ( .NOT.LSAME( TRANSB, 'C' ) ).AND. $ ( .NOT.LSAME( TRANSB, 'T' ) ) ) THEN INFO = 4 ELSE IF( M .LT.0 ) THEN INFO = 5 ELSE IF( N .LT.0 ) THEN INFO = 6 ELSE IF( K .LT.0 ) THEN INFO = 7 ELSE IF( MB .LT.1 ) THEN INFO = 8 ELSE IF( NB .LT.1 ) THEN INFO = 9 ELSE IF( KB .LT.1 ) THEN INFO = 10 ELSE IF( ICROW.LT.0 .OR. ICROW.GE.NPROW ) THEN INFO = 23 ELSE IF( ICCOL.LT.0 .OR. ICCOL.GE.NPCOL ) THEN INFO = 24 END IF * 10 CONTINUE IF( INFO.NE.0 ) THEN CALL PXERBLA( ICONTXT, 'PBDGEMM ', INFO ) RETURN END IF * * Start the operations. * * Initialize parameters * COMMA = ACOMM IF( LSAME( COMMA, ' ' ) ) COMMA = '1' COMMB = BCOMM IF( LSAME( COMMB, ' ' ) ) COMMB = '1' * IF( NOTB ) THEN IF( NOTA ) THEN * * Form C := alpha*A*B + beta*C. * IF( CMAT ) THEN * _____________ _ _____________ * | | | | | | * | | | | | | * | | | | _____________ | | * | C | = alpha*|A|*|______B______| + beta*| C | * | | | | | | * | | | | | | * |_____________| |_| |_____________| * * Initialize parameters * MP = NUMROC( M, MB, MYROW, ICROW, NPROW ) NQ = NUMROC( N, NB, MYCOL, ICCOL, NPCOL ) ASPACE = LSAME( AWORK, 'Y' ) BSPACE = LSAME( BWORK, 'Y' ) * IF( LDA.LT.MAX(1,MP) .AND. ( ASPACE .OR. $ IACOL.EQ.MYCOL .OR. IACOL.EQ.-1 ) ) THEN INFO = 13 ELSE IF( LDB.LT.MAX(1,K) .AND. ( BSPACE .OR. $ IBROW.EQ.MYROW .OR. IBROW.EQ.-1 ) ) THEN INFO = 15 ELSE IF( LDC.LT.MAX(1,MP) ) THEN INFO = 18 ELSE IF( IAROW.NE.ICROW ) THEN INFO = 19 ELSE IF( IACOL.LT.-1 .OR. IACOL.GE.NPCOL ) THEN INFO = 20 ELSE IF( IBROW.LT.-1 .OR. IBROW.GE.NPROW ) THEN INFO = 21 ELSE IF( IBCOL.NE.ICCOL ) THEN INFO = 22 END IF IF( INFO.NE.0 ) GO TO 10 * ADATA = .FALSE. IF( IACOL.EQ.-1 ) ADATA = .TRUE. BDATA = .FALSE. IF( IBROW.EQ.-1 ) BDATA = .TRUE. * * Broadcast B first then A * IF( LSAME( BR1ST, 'B' ) ) THEN * * Broadcast B first to WORK(IPB) if necessary (IBROW >= 0) * IPB = 1 IPA = 1 IF( .NOT.BDATA ) THEN IF( BSPACE ) THEN IF( MYROW.EQ.IBROW ) THEN CALL DGEBS2D( ICONTXT, 'Col', COMMB, K, NQ, B, LDB ) ELSE CALL DGEBR2D( ICONTXT, 'Col', COMMB, K, NQ, B, LDB, $ IBROW, MYCOL ) END IF BDATA = .TRUE. ELSE IF( MYROW.EQ.IBROW ) THEN CALL PBDMATADD( ICONTXT, 'G', K, NQ, ONE, B, LDB, $ ZERO, WORK(IPB), K ) CALL DGEBS2D( ICONTXT, 'Col', COMMB, K, NQ, $ WORK(IPB), K ) ELSE CALL DGEBR2D( ICONTXT, 'Col', COMMB, K, NQ, $ WORK(IPB), K, IBROW, MYCOL ) END IF IPA = K * NQ + 1 END IF END IF * * Broadcast A next to WORK(IPA) if necessary (IACOL >= 0) * IF( .NOT.ADATA ) THEN IF( ASPACE ) THEN IF( MYCOL.EQ.IACOL ) THEN CALL DGEBS2D( ICONTXT, 'Row', COMMA, MP, K, A, LDA ) ELSE CALL DGEBR2D( ICONTXT, 'Row', COMMA, MP, K, A, LDA, $ MYROW, IACOL ) END IF ADATA = .TRUE. ELSE IF( MYCOL.EQ.IACOL ) THEN CALL PBDMATADD( ICONTXT, 'V', MP, K, ONE, A, LDA, $ ZERO, WORK(IPA), MP ) CALL DGEBS2D( ICONTXT, 'Row', COMMA, MP, K, $ WORK(IPA), MP ) ELSE CALL DGEBR2D( ICONTXT, 'Row', COMMA, MP, K, $ WORK(IPA), MP, MYROW, IACOL ) END IF END IF END IF * * Broadcast A first then B * ELSE * * Broadcast A first to WORK(IPA) if necessary (IACOL >= 0) * IPA = 1 IPB = 1 IF( .NOT.ADATA ) THEN IF( ASPACE ) THEN IF( MYCOL.EQ.IACOL ) THEN CALL DGEBS2D( ICONTXT, 'Row', COMMA, MP, K, A, LDA ) ELSE CALL DGEBR2D( ICONTXT, 'Row', COMMA, MP, K, A, LDA, $ MYROW, IACOL ) END IF ADATA = .TRUE. ELSE IF( MYCOL.EQ.IACOL ) THEN CALL PBDMATADD( ICONTXT, 'V', MP, K, ONE, A, LDA, $ ZERO, WORK(IPA), MP ) CALL DGEBS2D( ICONTXT, 'Row', COMMA, MP, K, $ WORK(IPA), MP ) ELSE CALL DGEBR2D( ICONTXT, 'Row', COMMA, MP, K, $ WORK(IPA), MP, MYROW, IACOL ) END IF IPB = MP * K + 1 END IF END IF * * Broadcast B next to WORK(IPB) if necessary (IBROW >= 0) * IF( .NOT.BDATA ) THEN IF( BSPACE ) THEN IF( MYROW.EQ.IBROW ) THEN CALL DGEBS2D( ICONTXT, 'Col', COMMB, K, NQ, B, LDB ) ELSE CALL DGEBR2D( ICONTXT, 'Col', COMMB, K, NQ, B, LDB, $ IBROW, MYCOL ) END IF BDATA = .TRUE. ELSE IF( MYROW.EQ.IBROW ) THEN CALL PBDMATADD( ICONTXT, 'G', K, NQ, ONE, B, LDB, $ ZERO, WORK(IPB), K ) CALL DGEBS2D( ICONTXT, 'Col', COMMB, K, NQ, $ WORK(IPB), K ) ELSE CALL DGEBR2D( ICONTXT, 'Col', COMMB, K, NQ, $ WORK(IPB), K, IBROW, MYCOL ) END IF END IF END IF END IF * * A and B are distributed (ready), Compute C * IF( MP.GT.0 ) THEN IF( ADATA ) THEN IF( BDATA ) THEN CALL DGEMM( 'No', 'No', MP, NQ, K, ALPHA, A, LDA, $ B, LDB, BETA, C, LDC ) ELSE CALL DGEMM( 'No', 'No', MP, NQ, K, ALPHA, A, LDA, $ WORK(IPB), MAX(1,K), BETA, C, LDC ) END IF ELSE IF( BDATA ) THEN CALL DGEMM( 'No', 'No', MP, NQ, K, ALPHA, WORK(IPA), $ MP, B, LDB, BETA, C, LDC ) ELSE CALL DGEMM( 'No', 'No', MP, NQ, K, ALPHA, WORK(IPA), $ MP, WORK(IPB), MAX(1,K), BETA, C, LDC ) END IF END IF END IF * ELSE IF( AMAT ) THEN * _ _____________ _ _ * | | | | | | | | * | | | | | | | | * | | | | | | | | * |C| = alpha * | A | * |B| + beta * |C| * | | | | | | | | * | | | | | | | | * |_| |_____________| |_| |_| * * Initialize parameters * MP = NUMROC( M, MB, MYROW, IAROW, NPROW ) KP = NUMROC( K, KB, MYROW, IBROW, NPROW ) KQ = NUMROC( K, KB, MYCOL, IACOL, NPCOL ) CSPACE = LSAME( CWORK, 'Y' ) * IF( LDA.LT.MAX(1,MP) ) THEN INFO = 13 ELSE IF( LDB.LT.MAX(1,KP) .AND. $ ( IBCOL.EQ.MYCOL .OR. IBCOL.EQ.-1 ) ) THEN INFO = 15 ELSE IF( LDC.LT.MAX(1,MP) .AND. $ ( ICCOL.EQ.MYCOL .OR. CSPACE ) ) THEN INFO = 18 ELSE IF( IAROW.NE.ICROW ) THEN INFO = 19 ELSE IF( IACOL.LT.0 .OR. IACOL.GE.NPCOL ) THEN INFO = 20 ELSE IF( IBROW.LT.0 .OR. IBROW.GE.NPROW ) THEN INFO = 21 ELSE IF( IBCOL.LT.-1 .OR. IBCOL.GE.NPCOL ) THEN INFO = 22 END IF IF( INFO.NE.0 ) GO TO 10 * * Transpose B to WORK * IPC = N * KQ + 1 CALL PBDTRAN( ICONTXT, 'Col', 'Trans', K, N, KB, B, LDB, $ ZERO, WORK, N, IBROW, IBCOL, -1, IACOL, $ WORK(IPC) ) * * Compute C * IF( CSPACE ) THEN TBETA = ZERO IF( MYCOL.EQ.ICCOL ) TBETA = BETA * IF( KQ.GT.0 ) THEN CALL DGEMM( 'No', 'Trans', MP, N, KQ, ALPHA, A, LDA, $ WORK, N, TBETA, C, LDC ) ELSE IF( K.GT.KB .OR. IACOL.NE.ICCOL $ .OR. MYCOL.EQ.ICCOL ) THEN CALL PBDMATADD( ICONTXT, 'V', MP, N, ZERO, DUMMY, 1, $ TBETA, C, LDC ) END IF * * Add C rowwise * IF( K.GT.KB .OR. IACOL.NE.ICCOL ) $ CALL DGSUM2D( ICONTXT, 'Row', '1-tree', MP, N, C, LDC, $ MYROW, ICCOL ) * ELSE IF( KQ.GT.0 ) THEN CALL DGEMM( 'No', 'Trans', MP, N, KQ, ALPHA, A, LDA, $ WORK, N, ZERO, WORK(IPC), MAX(1,MP) ) ELSE CALL PBDMATADD( ICONTXT, 'V', MP, N, ZERO, DUMMY, 1, $ ZERO, WORK(IPC), MP ) END IF * * Add WORK(IPC) rowwise * IF( MYCOL.EQ.ICCOL ) THEN CALL PBDMATADD( ICONTXT, 'G', MP, N, ONE, WORK(IPC), $ MP, BETA, C, LDC ) IF( K.GT.KB .OR. IACOL.NE.ICCOL ) $ CALL DGSUM2D( ICONTXT, 'Row', '1-tree', MP, N, C, LDC, $ MYROW, ICCOL ) ELSE IF( K.GT.KB .OR. IACOL.NE.ICCOL ) THEN CALL DGSUM2D( ICONTXT, 'Row', '1-tree', MP, N, $ WORK(IPC), MP, MYROW, ICCOL ) END IF END IF * ELSE * IF( LSAME( MATPOS, 'B' ) ) THEN * ____________ * | | * | | * ____________ _____________ | | ____________ * |_____C______| = a*|______A______|*| B |+b*|_____C______| * | | * | | * |____________| * * Initialize parameters * NQ = NUMROC( N, NB, MYCOL, IBCOL, NPCOL ) KP = NUMROC( K, KB, MYROW, IBROW, NPROW ) CSPACE = LSAME( CWORK, 'Y' ) * IF( LDA.LT.MAX(1,M) .AND. $ ( IAROW.EQ.MYROW .OR. IAROW.EQ.-1 ) ) THEN INFO = 13 ELSE IF( LDB.LT.MAX(1,KP) ) THEN INFO = 15 ELSE IF( LDC.LT.MAX(1,M) .AND. $ ( ICROW.EQ.MYROW .OR. CSPACE ) ) THEN INFO = 18 ELSE IF( IAROW.LT.-1 .OR. IAROW.GE.NPROW ) THEN INFO = 19 ELSE IF( IACOL.LT.0 .OR. IACOL.GE.NPCOL ) THEN INFO = 20 ELSE IF( IBROW.LT.0 .OR. IBROW.GE.NPROW ) THEN INFO = 21 ELSE IF( IBCOL.NE.ICCOL ) THEN INFO = 22 END IF IF( INFO.NE.0 ) GO TO 10 * * Transpose A to WORK * IPC = KP * M + 1 CALL PBDTRAN( ICONTXT, 'Row', 'Trans', M, K, KB, A, LDA, $ ZERO, WORK, KP, IAROW, IACOL, IBROW, -1, $ WORK(IPC) ) * * Compute C * IF( CSPACE ) THEN TBETA = ZERO IF( MYROW.EQ.ICROW ) TBETA = BETA * IF( KP.GT.0 ) THEN CALL DGEMM( 'Trans', 'No', M, NQ, KP, ALPHA, WORK, $ KP, B, LDB, TBETA, C, LDC ) ELSE IF( K.GT.KB .OR. IBROW.NE.ICROW $ .OR. MYROW.EQ.ICROW ) THEN CALL PBDMATADD( ICONTXT, 'G', M, NQ, ZERO, DUMMY, 1, $ TBETA, C, LDC ) END IF * * Add C columnwise * IF( K.GT.KB .OR. IBROW.NE.ICROW ) $ CALL DGSUM2D( ICONTXT, 'Col', '1-tree', M, NQ, C, LDC, $ ICROW, MYCOL ) * ELSE IF( KP.GT.0 ) THEN CALL DGEMM( 'Trans', 'No', M, NQ, KP, ALPHA, WORK, $ KP, B, LDB, ZERO, WORK(IPC), M ) ELSE CALL PBDMATADD( ICONTXT, 'G', M, NQ, ZERO, DUMMY, 1, $ ZERO, WORK(IPC), M ) END IF * * Add WORK(IPC) columnwise * IF( MYROW.EQ.ICROW ) THEN CALL PBDMATADD( ICONTXT, 'G', M, NQ, ONE, WORK(IPC), M, $ BETA, C, LDC ) IF( K.GT.KB .OR. IBROW.NE.ICROW ) $ CALL DGSUM2D( ICONTXT, 'Col', '1-tree', M, NQ, C, LDC, $ ICROW, MYCOL ) ELSE IF( K.GT.KB .OR. IBROW.NE.ICROW ) THEN CALL DGSUM2D( ICONTXT, 'Col', '1-tree', M, NQ, $ WORK(IPC), M, ICROW, MYCOL ) END IF END IF END IF * ELSE * * Form C := alpha*A'*B + beta*C * IF( CMAT ) THEN * ____________ ____________ * | | | | * | | | | * | | _____________ ____________ | | * | C | = a*|______A______|*|_____B______|+b*| C | * | | ( A') | | * | | | | * |____________| |____________| * * Initialize parameters * MP = NUMROC( M, MB, MYROW, ICROW, NPROW ) NQ = NUMROC( N, NB, MYCOL, ICCOL, NPCOL ) BSPACE = LSAME( BWORK, 'Y' ) * IF( LDA.LT.MAX(1,K) .AND. $ ( IAROW.EQ.MYROW .OR. IAROW.EQ.-1 ) ) THEN INFO = 13 ELSE IF( LDB.LT.MAX(1,K) .AND. ( BSPACE .OR. $ IBROW.EQ.MYROW .OR. IBROW.EQ.-1 ) ) THEN INFO = 15 ELSE IF( LDC.LT.MAX(1,MP) ) THEN INFO = 18 ELSE IF( IAROW.LT.-1 .OR. IAROW.GE.NPROW ) THEN INFO = 19 ELSE IF( IACOL.LT.0 .OR. IACOL.GE.NPCOL ) THEN INFO = 20 ELSE IF( IBROW.LT.-1 .OR. IBROW.GE.NPROW ) THEN INFO = 21 ELSE IF( IBCOL.NE.ICCOL ) THEN INFO = 22 END IF IF( INFO.NE.0 ) GO TO 10 * BDATA = .FALSE. IF( IBROW.EQ.-1 ) BDATA = .TRUE. * * Broadcast B first then transpose A * IF( LSAME( BR1ST, 'B' ) ) THEN * * Broadcast B if necessary ( IBROW >= 0 ) * IPB = 1 IPA = 1 IF( .NOT.BDATA ) THEN IF( BSPACE ) THEN IF( MYROW.EQ.IBROW ) THEN CALL DGEBS2D( ICONTXT, 'Col', COMMB, K, NQ, B, LDB ) ELSE CALL DGEBR2D( ICONTXT, 'Col', COMMB, K, NQ, B, LDB, $ IBROW, MYCOL ) END IF BDATA = .TRUE. ELSE IF( MYROW.EQ.IBROW ) THEN CALL PBDMATADD( ICONTXT, 'G', K, NQ, ONE, B, LDB, $ ZERO, WORK(IPB), K ) CALL DGEBS2D( ICONTXT, 'Col', COMMB, K, NQ, $ WORK(IPB), K ) ELSE CALL DGEBR2D( ICONTXT, 'Col', COMMB, K, NQ, $ WORK(IPB), K, IBROW, MYCOL ) END IF IPA = K * NQ + 1 END IF END IF * * Transpose A to WORK(IPA) * IPT = MP * K + IPA CALL PBDTRAN( ICONTXT, 'Row', TRANSA, K, M, MB, A, LDA, $ ZERO, WORK(IPA), MP, IAROW, IACOL, ICROW, $ -1, WORK(IPT) ) * * Transpose A first then broadcast B * ELSE * * Transpose A to WORK(IPA) * IPA = 1 IPB = MP * K + 1 CALL PBDTRAN( ICONTXT, 'Row', TRANSA, K, M, MB, A, LDA, $ ZERO, WORK(IPA), MP, IAROW, IACOL, ICROW, $ -1, WORK(IPB) ) * * Broadcast B to WORK(IPB) if necessary ( IBROW >= 0 ) * IF( .NOT.BDATA ) THEN IF( BSPACE ) THEN IF( MYROW.EQ.IBROW ) THEN CALL DGEBS2D( ICONTXT, 'Col', COMMB, K, NQ, B, LDB ) ELSE CALL DGEBR2D( ICONTXT, 'Col', COMMB, K, NQ, B, LDB, $ IBROW, MYCOL ) END IF BDATA = .TRUE. ELSE IF( MYROW.EQ.IBROW ) THEN CALL PBDMATADD( ICONTXT, 'G', K, NQ, ONE, B, LDB, $ ZERO, WORK(IPB), K ) CALL DGEBS2D( ICONTXT, 'Col', COMMB, K, NQ, $ WORK(IPB), K ) ELSE CALL DGEBR2D( ICONTXT, 'Col', COMMB, K, NQ, $ WORK(IPB), K, IBROW, MYCOL ) END IF END IF END IF END IF * * A and B are distributed (ready), Compute C * IF( MP.GT.0 ) THEN IF( BDATA ) THEN CALL DGEMM( 'No', 'No', MP, NQ, K, ALPHA, WORK(IPA), $ MP, B, LDB, BETA, C, LDC ) ELSE CALL DGEMM( 'No', 'No', MP, NQ, K, ALPHA, WORK(IPA), $ MP, WORK(IPB), MAX(1,K), BETA, C, LDC ) END IF END IF * ELSE IF( AMAT ) THEN * _ _____________ _ _ * | | | | | | | | * | | | | | | | | * | | | | | | | | * |C| = alpha * | A | * |B| + beta * |C| * | | | ( A') | | | | | * | | | | | | | | * |_| |_____________| |_| |_| * * Initialize parameters * MP = NUMROC( M, MB, MYROW, ICROW, NPROW ) MQ = NUMROC( M, MB, MYCOL, IACOL, NPCOL ) KP = NUMROC( K, KB, MYROW, IAROW, NPROW ) BSPACE = LSAME( BWORK, 'Y' ) * IF( LDA.LT.MAX(1,KP) ) THEN INFO = 13 ELSE IF( LDB.LT.MAX(1,KP) .AND. ( BSPACE .OR. $ IBCOL.EQ.MYCOL .OR. IBCOL.EQ.-1 ) ) THEN INFO = 15 ELSE IF( LDC.LT.MAX(1,MP) .AND. MYCOL.EQ.ICCOL ) THEN INFO = 18 ELSE IF( IAROW.LT.0 .OR. IAROW.GE.NPROW ) THEN INFO = 19 ELSE IF( IACOL.LT.0 .OR. IACOL.GE.NPCOL ) THEN INFO = 20 ELSE IF( IBROW.NE.IAROW ) THEN INFO = 21 ELSE IF( IBCOL.LT.-1 .OR. IBCOL.GE.NPCOL ) THEN INFO = 22 END IF IF( INFO.NE.0 ) GO TO 10 * BDATA = .FALSE. IF( IBCOL.EQ.-1 ) BDATA = .TRUE. * * Broadcast B to B or WORK(IPT) if necessary ( IBCOL >= 0 ) * IPB = N * MQ + 1 IF( .NOT.BDATA ) THEN IF( BSPACE ) THEN IF( MYCOL.EQ.IBCOL ) THEN CALL DGEBS2D( ICONTXT, 'Row', COMMB, KP, N, B, LDB ) ELSE CALL DGEBR2D( ICONTXT, 'Row', COMMB, KP, N, B, LDB, $ MYROW, IBCOL ) END IF BDATA = .TRUE. ELSE IF( MYCOL.EQ.IBCOL ) THEN CALL PBDMATADD( ICONTXT, 'G', KP, N, ONE, B, LDB, $ ZERO, WORK(IPB), KP ) CALL DGEBS2D( ICONTXT, 'Row', COMMB, KP, N, $ WORK(IPB), KP ) ELSE CALL DGEBR2D( ICONTXT, 'Row', COMMB, KP, N, $ WORK(IPB), KP, MYROW, IBCOL ) END IF END IF END IF * * Compute C' (<= B' * A ) * IF( KP.GT.0 ) THEN IF( BDATA ) THEN CALL DGEMM( TRANSA, 'No', N, MQ, KP, ALPHA, B, LDB, $ A, LDA, ZERO, WORK, N ) ELSE CALL DGEMM( TRANSA, 'No', N, MQ, KP, ALPHA, WORK(IPB), $ MAX(1,KP), A, LDA, ZERO, WORK, N ) END IF ELSE CALL PBDMATADD( ICONTXT, 'G', N, MQ, ZERO, DUMMY, 1, $ ZERO, WORK, N ) END IF * * Add C (= WORK) columnwise and transpose it * IF( K.GT.KB ) $ CALL DGSUM2D( ICONTXT, 'Col', '1-tree', N, MQ, WORK, N, $ IAROW, MYCOL ) * CALL PBDTRAN( ICONTXT, 'Row', TRANSA, N, M, MB, WORK, N, $ BETA, C, LDC, IAROW, IACOL, ICROW, ICCOL, $ WORK(IPB) ) * ELSE * IF( LSAME( MATPOS, 'B' ) ) THEN * _ ____________ * | | | | * | | | | * ____________ | | | | ____________ * |_____C______| = alpha*|A| * | B | + beta*|_____C______| * ( A') | | * | | | | * |_| |____________| * * Initialize parameters * NQ = NUMROC( N, NB, MYCOL, IBCOL, NPCOL ) KP = NUMROC( K, KB, MYROW, IBROW, NPROW ) ASPACE = LSAME( AWORK, 'Y' ) CSPACE = LSAME( CWORK, 'Y' ) * IF( LDA.LT.MAX(1,KP) .AND. ( ASPACE .OR. $ IACOL.EQ.MYCOL .OR. IACOL.EQ.-1 ) ) THEN INFO = 13 ELSE IF( LDB.LT.MAX(1,KP) ) THEN INFO = 15 ELSE IF( LDC.LT.MAX(1,M) .AND. $ ( MYROW.EQ.ICROW .OR. CSPACE ) ) THEN INFO = 18 ELSE IF( IAROW.NE.IBROW ) THEN INFO = 19 ELSE IF( IACOL.LT.-1 .OR. IACOL.GE.NPCOL ) THEN INFO = 20 ELSE IF( IBROW.LT.0 .OR. IBROW.GE.NPROW ) THEN INFO = 21 ELSE IF( IBCOL.NE.ICCOL ) THEN INFO = 22 END IF IF( INFO.NE.0 ) GO TO 10 * ADATA = .FALSE. IF( IACOL.EQ.-1 ) ADATA = .TRUE. IPC = 1 * * Broadcast A if necessary ( IACOL >= 0 ) * IF( .NOT.ADATA ) THEN IF( ASPACE ) THEN IF( MYCOL.EQ.IACOL ) THEN CALL DGEBS2D( ICONTXT, 'Row', COMMA, KP, M, A, LDA ) ELSE CALL DGEBR2D( ICONTXT, 'Row', COMMA, KP, M, A, LDA, $ MYROW, IACOL ) END IF ADATA = .TRUE. ELSE IF( MYCOL.EQ.IACOL ) THEN CALL PBDMATADD( ICONTXT, 'V', KP, M, ONE, A, LDA, $ ZERO, WORK, KP ) CALL DGEBS2D( ICONTXT, 'Row', COMMA, KP, M, WORK, KP ) ELSE CALL DGEBR2D( ICONTXT, 'Row', COMMA, KP, M, WORK, KP, $ MYROW, IACOL ) END IF IPC = KP * M + 1 END IF END IF * * Compute C * IF( CSPACE ) THEN TBETA = ZERO IF( MYROW.EQ.ICROW ) TBETA = BETA * IF( KP.GT.0 ) THEN IF( ADATA ) THEN CALL DGEMM( TRANSA, 'No', M, NQ, KP, ALPHA, A, LDA, $ B, LDB, TBETA, C, LDC ) ELSE CALL DGEMM( TRANSA, 'No', M, NQ, KP, ALPHA, WORK, $ KP, B, LDB, TBETA, C, LDC ) END IF ELSE IF( K.GT.KB .OR. IBROW.NE.ICROW $ .OR. MYROW.EQ.ICROW ) THEN CALL PBDMATADD( ICONTXT,'G', M, NQ, ZERO, DUMMY, 1, $ TBETA, C, LDC ) END IF * * Add C columnwise * IF( K.GT.KB .OR. IBROW.NE.ICROW ) $ CALL DGSUM2D( ICONTXT, 'Col', '1-tree', M, NQ, C, LDC, $ ICROW, MYCOL ) * ELSE IF( KP.GT.0 ) THEN IF( ADATA ) THEN CALL DGEMM( TRANSA, 'No', M, NQ, KP, ALPHA, A, LDA, $ B, LDB, ZERO, WORK(IPC), M ) ELSE CALL DGEMM( TRANSA, 'No', M, NQ, KP, ALPHA, WORK, $ KP, B, LDB, ZERO, WORK(IPC), M ) END IF ELSE CALL PBDMATADD( ICONTXT, 'G', M, NQ, ZERO, DUMMY, 1, $ ZERO, WORK(IPC), M ) END IF * * Add WORK(IPC) columnwise * IF( MYROW.EQ.ICROW ) THEN CALL PBDMATADD( ICONTXT, 'G', M, NQ, ONE, WORK(IPC), M, $ BETA, C, LDC ) IF( K.GT.KB .OR. IBROW.NE.ICROW ) $ CALL DGSUM2D( ICONTXT, 'Col', '1-tree', M, NQ, C, LDC, $ ICROW, MYCOL ) ELSE IF( K.GT.KB .OR. IBROW.NE.ICROW ) THEN CALL DGSUM2D( ICONTXT, 'Col', '1-tree', M, NQ, $ WORK(IPC), M, ICROW, MYCOL ) END IF END IF END IF END IF * ELSE IF( NOTA ) THEN * * Form C := alpha*A*B' + beta*C * IF( CMAT ) THEN * _____________ _ _ _____________ * | | | | | | | | * | | | | | | | | * | | | | | | | | * | C | = alpha * |A| * |B| + beta * | C | * | | | | ( B') | | * | | | | | | | | * |_____________| |_| |_| |_____________| * * Initialize parameters * MP = NUMROC( M, MB, MYROW, ICROW, NPROW ) NP = NUMROC( N, NB, MYROW, IBROW, NPROW ) NQ = NUMROC( N, NB, MYCOL, ICCOL, NPCOL ) ASPACE = LSAME( AWORK, 'Y' ) * IF( LDA.LT.MAX(1,MP) .AND. ( ASPACE .OR. $ IACOL.EQ.MYCOL .OR. IACOL.EQ.-1 ) ) THEN INFO = 13 ELSE IF( LDB.LT.MAX(1,NP) .AND. $ ( IBCOL.EQ.MYCOL .OR. IBCOL.EQ.-1 ) ) THEN INFO = 15 ELSE IF( LDC.LT.MAX(1,MP) ) THEN INFO = 18 ELSE IF( IAROW.NE.ICROW ) THEN INFO = 19 ELSE IF( IACOL.LT.-1 .OR. IACOL.GE.NPCOL ) THEN INFO = 20 ELSE IF( IBROW.LT.0 .OR. IBROW.GE.NPROW ) THEN INFO = 21 ELSE IF( IBCOL.LT.-1 .OR. IBCOL.GE.NPCOL ) THEN INFO = 22 END IF IF( INFO.NE.0 ) GO TO 10 * ADATA = .FALSE. IF( IACOL.EQ.-1 ) ADATA = .TRUE. BDATA = .FALSE. IF( IBCOL.EQ.-1 ) BDATA = .TRUE. * * Transpose B first then broadcast A * IF( LSAME( BR1ST, 'B' ) ) THEN * * Transpose B to WORK(IPB) * IPB = 1 IPA = K * NQ + 1 CALL PBDTRAN( ICONTXT, 'Col', TRANSB, N, K, NB, B, LDB, $ ZERO, WORK(IPB), K, IBROW, IBCOL, -1, ICCOL, $ WORK(IPA) ) * * Broadcast A to WORK(IPA) if necessary ( IACOL >= 0 ) * IF( .NOT.ADATA ) THEN IF( ASPACE ) THEN IF( MYCOL.EQ.IACOL ) THEN CALL DGEBS2D( ICONTXT, 'Row', COMMA, MP, K, A, LDA ) ELSE CALL DGEBR2D( ICONTXT, 'Row', COMMA, MP, K, A, LDA, $ MYROW, IACOL ) END IF ADATA = .TRUE. ELSE IF( MYCOL.EQ.IACOL ) THEN CALL PBDMATADD( ICONTXT, 'V', MP, K, ONE, A, LDA, $ ZERO, WORK(IPA), MP ) CALL DGEBS2D( ICONTXT, 'Row', COMMA, MP, K, $ WORK(IPA), MP ) ELSE CALL DGEBR2D( ICONTXT, 'Row', COMMA, MP, K, $ WORK(IPA), MP, MYROW, IACOL ) END IF END IF END IF * * Broadcast A first then transpose B * ELSE * * Broadcast A to WORK(IPA) if necessary ( IACOL >= 0 ) * IPA = 1 IPB = 1 IF( .NOT.ADATA ) THEN IF( ASPACE ) THEN IF( MYCOL.EQ.IACOL ) THEN CALL DGEBS2D( ICONTXT, 'Row', COMMA, MP, K, A, LDA ) ELSE CALL DGEBR2D( ICONTXT, 'Row', COMMA, MP, K, A, LDA, $ MYROW, IACOL ) END IF ADATA = .TRUE. ELSE IF( MYCOL.EQ.IACOL ) THEN CALL PBDMATADD( ICONTXT, 'V', MP, K, ONE, A, LDA, $ ZERO, WORK(IPA), MP ) CALL DGEBS2D( ICONTXT, 'Row', COMMA, MP, K, $ WORK(IPA), MP ) ELSE CALL DGEBR2D( ICONTXT, 'Row', COMMA, MP, K, $ WORK(IPA), MP, MYROW, IACOL ) END IF IPB = MP * K + 1 END IF END IF * * Transpose B to WORK(IPB) * IPT = K * NQ + IPB CALL PBDTRAN( ICONTXT, 'Col', TRANSB, N, K, NB, B, LDB, $ ZERO, WORK(IPB), K, IBROW, IBCOL, -1, ICCOL, $ WORK(IPT) ) END IF * * A and B are distributed, then compute C * IF( MP.GT.0 ) THEN IF( ADATA ) THEN CALL DGEMM( 'No', 'No', MP, NQ, K, ALPHA, A, LDA, $ WORK(IPB), MAX(1,K), BETA, C, LDC ) ELSE CALL DGEMM( 'No', 'No', MP, NQ, K, ALPHA, WORK(IPA), $ MP, WORK(IPB), MAX(1,K), BETA, C, LDC ) END IF END IF * ELSE IF( LSAME( MATPOS, 'A' ) ) THEN * _ _____________ _ * | | | | | | * | | | | | | * | | | | _____________ | | * |C| = alpha * | A |*|______B______| + beta * |C| * | | | | ( B') | | * | | | | | | * |_| |_____________| |_| * * Initialize parameters * MP = NUMROC( M, MB, MYROW, IAROW, NPROW ) KQ = NUMROC( K, KB, MYCOL, IACOL, NPCOL ) BSPACE = LSAME( BWORK, 'Y' ) CSPACE = LSAME( CWORK, 'Y' ) * IF( LDA.LT.MAX(1,MP) ) THEN INFO = 13 ELSE IF( LDB.LT.MAX(1,N) .AND. ( BSPACE .OR. $ IBROW.EQ.MYROW .OR. IBROW.EQ.-1 ) ) THEN INFO = 15 ELSE IF( LDC.LT.MAX(1,MP) .AND. $ ( CSPACE .OR. ICCOL.EQ.MYCOL ) ) THEN INFO = 18 ELSE IF( IAROW.NE.ICROW ) THEN INFO = 19 ELSE IF( IACOL.LT.0 .OR. IACOL.GE.NPCOL ) THEN INFO = 20 ELSE IF( IBROW.LT.-1 .OR. IBROW.GE.NPROW ) THEN INFO = 21 ELSE IF( IBCOL.NE.IACOL ) THEN INFO = 22 END IF IF( INFO.NE.0 ) GO TO 10 * BDATA = .FALSE. IF( IBROW.EQ.-1 ) BDATA = .TRUE. IPC = 1 * * Broadcast B to WORK if necessary ( IBROW >= 0 ) * IF( .NOT.BDATA ) THEN IF( BSPACE ) THEN IF( MYROW.EQ.IBROW ) THEN CALL DGEBS2D( ICONTXT, 'Col', COMMB, N, KQ, B, LDB ) ELSE CALL DGEBR2D( ICONTXT, 'Col', COMMB, N, KQ, B, LDB, $ IBROW, MYCOL ) END IF BDATA = .TRUE. ELSE IF( MYROW.EQ.IBROW ) THEN CALL PBDMATADD( ICONTXT, 'G', N, KQ, ONE, B,LDB, $ ZERO, WORK, N ) CALL DGEBS2D( ICONTXT, 'Col', COMMB, N, KQ, WORK, N ) ELSE CALL DGEBR2D( ICONTXT, 'Col', COMMB, N, KQ, WORK, N, $ IBROW, MYCOL ) END IF IPC = N * KQ + 1 END IF END IF * * Compute C * IF( CSPACE ) THEN TBETA = ZERO IF( MYCOL.EQ.ICCOL ) TBETA = BETA * IF( KQ.GT.0 ) THEN IF( BDATA ) THEN CALL DGEMM( 'No', TRANSB, MP, N, KQ, ALPHA, A, LDA, $ B, LDB, TBETA, C, LDC ) ELSE CALL DGEMM( 'No', TRANSB, MP, N, KQ, ALPHA, A, LDA, $ WORK, N, TBETA, C, LDC ) END IF ELSE IF( K.GT.KB .OR. IACOL.NE.ICCOL $ .OR. MYCOL.EQ.ICCOL ) THEN CALL PBDMATADD( ICONTXT, 'G', MP, N, ZERO, DUMMY, 1, $ TBETA, C, LDC ) END IF * * Add C rowwise * IF( K.GT.KB .OR. IACOL.NE.ICCOL ) $ CALL DGSUM2D( ICONTXT, 'Row', '1-tree', MP, N, C, LDC, $ MYROW, ICCOL ) * ELSE IF( KQ.GT.0 ) THEN IF( BDATA ) THEN CALL DGEMM( 'No', TRANSB, MP, N, KQ, ALPHA, A, LDA, $ B, LDB, ZERO, WORK(IPC), MAX(1,MP) ) ELSE CALL DGEMM( 'No', TRANSB, MP, N, KQ, ALPHA, A, LDA, $ WORK, N, ZERO, WORK(IPC), MAX(1,MP) ) END IF ELSE CALL PBDMATADD( ICONTXT, 'G', MP, N, ZERO, DUMMY, 1, $ ZERO, WORK(IPC), MP ) END IF * * Add WORK(IPC) rowwise * IF( MYCOL.EQ.ICCOL ) THEN CALL PBDMATADD( ICONTXT, 'V', MP, N, ONE, WORK(IPC), $ MP, BETA, C, LDC ) IF( K.GT.KB .OR. IACOL.NE.ICCOL ) $ CALL DGSUM2D( ICONTXT, 'Row', '1-tree', MP, N, C, LDC, $ MYROW, ICCOL ) ELSE IF( K.GT.KB .OR. IACOL.NE.ICCOL ) THEN CALL DGSUM2D( ICONTXT, 'Row', '1-tree', MP, N, $ WORK(IPC), MP, MYROW, ICCOL ) END IF END IF * ELSE IF( LSAME( MATPOS, 'B' ) ) THEN * ____________ * | | * | | * ____________ _____________ | | ____________ * |_____C______| = a*|______A______|*| B |+b*|_____C______| * | ( B') | * | | * |____________| * * Initialize parameters * NP = NUMROC( N, NB, MYROW, IBROW, NPROW ) KQ = NUMROC( K, KB, MYCOL, IBCOL, NPCOL ) ASPACE = LSAME( AWORK, 'Y' ) * IF( LDA.LT.MAX(1,M) .AND. ( ASPACE .OR. $ IAROW.EQ.MYROW .OR. IAROW.EQ.-1 ) ) THEN INFO = 13 ELSE IF( LDB.LT.MAX(1,NP) ) THEN INFO = 15 ELSE IF( LDC.LT.MAX(1,M) .AND. MYROW.EQ.ICROW ) THEN INFO = 18 ELSE IF( IAROW.LT.-1 .OR. IAROW.GE.NPROW ) THEN INFO = 19 ELSE IF( IACOL.NE.IBCOL ) THEN INFO = 20 ELSE IF( IBROW.LT.0 .OR. IBROW.GE.NPROW ) THEN INFO = 21 ELSE IF( IBCOL.LT.0 .OR. IBCOL.GE.NPCOL ) THEN INFO = 22 END IF IF( INFO.NE.0 ) GO TO 10 * ADATA = .FALSE. IF( IAROW.EQ.-1 ) ADATA = .TRUE. * * Broadcast A to WORK(IPT) if necessary ( IAROW >= 0 ) * IPA = NP * M + 1 IF( .NOT.ADATA ) THEN IF( ASPACE ) THEN IF( MYROW.EQ.IAROW ) THEN CALL DGEBS2D( ICONTXT, 'Col', COMMA, M, KQ, A, LDA ) ELSE CALL DGEBR2D( ICONTXT, 'Col', COMMA, M, KQ, A, LDA, $ IAROW, MYCOL ) END IF ADATA = .TRUE. ELSE IF( MYROW.EQ.IAROW ) THEN CALL PBDMATADD( ICONTXT, 'G', M, KQ, ONE, A, LDA, $ ZERO, WORK(IPA), M ) CALL DGEBS2D( ICONTXT, 'Col', COMMA, M, KQ, $ WORK(IPA), M ) ELSE CALL DGEBR2D( ICONTXT, 'Col', COMMA, M, KQ, $ WORK(IPA), M, IAROW, MYCOL ) END IF END IF END IF * * Compute and distribute C' (<= B * A' ) * IF( KQ.GT.0 ) THEN IF( ADATA ) THEN CALL DGEMM( 'No', TRANSB, NP, M, KQ, ALPHA, B, LDB, $ A, LDA, ZERO, WORK, MAX(1,NP) ) ELSE CALL DGEMM( 'No', TRANSB, NP, M, KQ, ALPHA, B, LDB, $ WORK(IPA), M, ZERO, WORK, MAX(1,NP) ) END IF ELSE CALL PBDMATADD( ICONTXT, 'G', NP, M, ZERO, DUMMY, 1, $ ZERO, WORK, NP ) END IF * * Add C rowwise and transpose it * IF( K.GT.KB ) $ CALL DGSUM2D( ICONTXT, 'Row', '1-tree', NP, M, WORK, NP, $ MYROW, IBCOL ) * CALL PBDTRAN( ICONTXT, 'Col', TRANSB, N, M, NB, WORK, NP, $ BETA, C, LDC, IBROW, IBCOL, ICROW, ICCOL, $ WORK(IPA) ) END IF * ELSE * * Form C := alpha*A'*B' + beta*C * IF( CMAT ) THEN * _____________ _ _____________ * | | | | | | * | | | | | | * | | _____________ | | | | * | C | = alpha*|______A______|*|B| + beta*| C | * | | ( A') ( B') | | * | | | | | | * |_____________| |_| |_____________| * * Initialize parameters * MP = NUMROC( M, MB, MYROW, ICROW, NPROW ) NP = NUMROC( N, NB, MYROW, IBROW, NPROW ) NQ = NUMROC( N, NB, MYCOL, ICCOL, NPCOL ) * IF( LDA.LT.MAX(1,K) .AND. $ ( IAROW.EQ.MYROW .OR. IAROW.EQ.-1 ) ) THEN INFO = 13 ELSE IF( LDB.LT.MAX(1,NP) .AND. $ ( IBCOL.EQ.MYCOL .OR. IBCOL.EQ.-1 ) ) THEN INFO = 15 ELSE IF( LDC.LT.MAX(1,MP) ) THEN INFO = 18 ELSE IF( IAROW.LT.-1 .OR. IAROW.GE.NPROW ) THEN INFO = 19 ELSE IF( IACOL.LT.0 .OR. IACOL.GE.NPCOL ) THEN INFO = 20 ELSE IF( IBROW.LT.0 .OR. IBROW.GE.NPROW ) THEN INFO = 21 ELSE IF( IBCOL.LT.-1 .OR. IBCOL.GE.NPCOL ) THEN INFO = 22 END IF IF( INFO.NE.0 ) GO TO 10 * * Transpose B first then A * IF( LSAME( BR1ST, 'B' ) ) THEN * * Transpose B first to WORK(IPB) * IPB = 1 IPA = K * NQ + IPB CALL PBDTRAN( ICONTXT, 'Col', TRANSB, N, K, NB, B, LDB, $ ZERO, WORK(IPB), K, IBROW, IBCOL, -1, ICCOL, $ WORK(IPA) ) * * Transpose A next to WORK(IPA) * IPT = MP * K + IPA CALL PBDTRAN( ICONTXT, 'Row', TRANSA, K, M, MB, A, LDA, $ ZERO, WORK(IPA), MP, IAROW, IACOL, ICROW, $ -1, WORK(IPT) ) * * Transpose A first then B * ELSE * * Transpose A first to WORK(IPA) * IPA = 1 IPB = MP * K + 1 CALL PBDTRAN( ICONTXT, 'Row', TRANSA, K, M, MB, A, LDA, $ ZERO, WORK(IPA), MP, IAROW, IACOL, ICROW, $ -1, WORK(IPB) ) * * Transpose B next to WORK(IPB) * IPT = K * NQ + IPB CALL PBDTRAN( ICONTXT, 'Col', TRANSB, N, K, NB, B, LDB, $ ZERO, WORK(IPB), K, IBROW, IBCOL, -1, ICCOL, $ WORK(IPT) ) END IF * * A and B are distributed, then compute C * CALL DGEMM( 'No', 'No', MP, NQ, K, ALPHA, WORK(IPA), $ MAX(1,MP), WORK(IPB), MAX(1,K), BETA, C, LDC ) * ELSE IF( AMAT ) THEN * _ _____________ _ * | | | | | | * | | | | | | * | | | | _____________ | | * |C| = alpha * | A | * |______B______| + beta * |C| * | | | ( A') | ( B') | | * | | | | | | * |_| |_____________| |_| * * Initialize parameters * MP = NUMROC( M, MB, MYROW, ICROW, NPROW ) MQ = NUMROC( M, MB, MYCOL, IACOL, NPCOL ) KP = NUMROC( K, KB, MYROW, IAROW, NPROW ) * IF( LDA.LT.MAX(1,KP) ) THEN INFO = 13 ELSE IF( LDB.LT.MAX(1,N) .AND. $ ( IBROW.EQ.MYROW .OR. IBROW.EQ.-1 ) ) THEN INFO = 15 ELSE IF( LDC.LT.MAX(1,MP) .AND. $ ( ICCOL.EQ.MYCOL .OR. ICCOL.EQ.-1 ) ) THEN INFO = 18 ELSE IF( IAROW.LT.0 .OR. IAROW.GE.NPROW ) THEN INFO = 19 ELSE IF( IACOL.LT.0 .OR. IACOL.GE.NPCOL ) THEN INFO = 20 ELSE IF( IBROW.LT.-1 .OR. IBROW.GE.NPROW ) THEN INFO = 21 ELSE IF( IBCOL.LT.0 .OR. IBCOL.GE.NPCOL ) THEN INFO = 22 END IF IF( INFO.NE.0 ) GO TO 10 * * Transpose B to WORK(IPB) * IPB = N * MQ + 1 IPT = KP * N + IPB CALL PBDTRAN( ICONTXT, 'Row', TRANSB, N, K, KB, B, LDB, $ ZERO, WORK(IPB), KP, IBROW, IBCOL, IAROW, -1, $ WORK(IPT) ) * * Compute C' ( <= B' * A ) * IF( KP.GT.0 ) THEN CALL DGEMM( TRANSA, 'No', N, MQ, KP, ALPHA, WORK(IPB), $ KP, A, LDA, ZERO, WORK, N ) ELSE CALL PBDMATADD( ICONTXT, 'G', N, MQ, ZERO, DUMMY, 1, $ ZERO, WORK, N ) END IF * * Add C columnwise and transpose it * IF( K.GT.KB ) $ CALL DGSUM2D( ICONTXT, 'Col', '1-tree', N, MQ, WORK, N, $ IAROW, MYCOL ) * CALL PBDTRAN( ICONTXT, 'Row', TRANSA, N, M, MB, WORK, N, $ BETA, C, LDC, IAROW, IACOL, ICROW, ICCOL, $ WORK(IPB) ) * ELSE * IF( LSAME( MATPOS, 'B') ) THEN * _ ____________ * | | | | * | | | | * ____________ | | | | ____________ * |______C_____| = alpha*|A| * | B | + beta*|_____C______| * ( A') | ( B') | * | | | | * |_| |____________| * * Initialize parameters * NP = NUMROC( N, NB, MYROW, IBROW, NPROW ) KP = NUMROC( K, KB, MYROW, IAROW, NPROW ) KQ = NUMROC( K, KB, MYCOL, IBCOL, NPCOL ) * IF( LDA.LT.MAX(1,KP) .AND. $ ( IACOL.EQ.MYCOL .OR. IACOL.EQ.-1 ) ) THEN INFO = 13 ELSE IF( LDB.LT.MAX(1,NP) ) THEN INFO = 15 ELSE IF( LDC.LT.MAX(1,M) .AND. $ ( ICROW.EQ.MYROW .OR. ICROW.EQ.-1 ) ) THEN INFO = 18 ELSE IF( IAROW.LT.0 .OR. IAROW.GE.NPROW ) THEN INFO = 19 ELSE IF( IACOL.LT.-1 .OR. IACOL.GE.NPCOL ) THEN INFO = 20 ELSE IF( IBROW.LT.0 .OR. IBROW.GE.NPROW ) THEN INFO = 21 ELSE IF( IBCOL.LT.0 .OR. IBCOL.GE.NPCOL ) THEN INFO = 22 END IF IF( INFO.NE.0 ) GO TO 10 * * Transpose A to WORK(IPA) * IPA = NP * M + 1 IPT = M * KQ + IPA CALL PBDTRAN( ICONTXT, 'Col', TRANSA, K, M, KB, A, LDA, $ ZERO, WORK(IPA), M, IAROW, IACOL, -1, IBCOL, $ WORK(IPT) ) * * Compute C' ( <= B * A' ) * IF( KQ.GT.0 ) THEN CALL DGEMM( 'No', TRANSB, NP, M, KQ, ALPHA, B, LDB, $ WORK(IPA), M, ZERO, WORK, MAX(1,NP) ) ELSE CALL PBDMATADD( ICONTXT, 'G', NP, M, ZERO, DUMMY, 1, $ ZERO, WORK, NP ) END IF * * Add C rowwise and transpose it * IF( K.GT.KB ) $ CALL DGSUM2D( ICONTXT, 'Row', '1-tree', NP, M, WORK, NP, $ MYROW, IBCOL ) * CALL PBDTRAN( ICONTXT, 'Col', TRANSB, N, M, NB, WORK, NP, $ BETA, C, LDC, IBROW, IBCOL, ICROW, ICCOL, $ WORK(IPA) ) END IF END IF END IF * RETURN * * End of PBDGEMM * END