Algorithm used for PLA_Gemm_B: For the case where C <- alpha * A * B + beta * C is to be computed, we partition C = / C_0 \ and / A_0 \ | : | | : | \ C_(K-1) / \ A_(k-1) / and iterate computing C_i = alpha * A_i * B + C_i. In other words, the computation is set up as a sequence of panel( of rows)-matrix multiplies. More precisely, the algorithm is given by ****************************************************************** C <- beta * C Partition A = / A_F \ and C = / C_F \ | === | | === | \ A_L / \ C_L / where A_F is 0 x k and C_F is 0 x n while A_L is not 0 x k determine block size b Partition / A_F \ / A_0 \ / C_F \ / C_0 \ | === | = | === | and | === | = | === | \ A_L / | A_1 | \ C_L / | C_1 | | --- | | --- | \ A_2 / \ C_2 / where A_0 = A_F and A_1 has length b C_0 = C_F and C_1 has length b Update C_1 <- alpha * A_1 * B + C_1 (panel-matrix mult.) Continue with / A_F \ / A_0 \ / C_F \ / C_0 \ | === | = | --- | and | === | = | --- | \ A_L / | A_1 | \ C_L / | C_1 | | === | | === | \ A_2 / \ C_2 / endwhile ****************************************************************** Appropriate changes need to be made depending on transa and transb