Skip to main content

Unit 3.3.3 Implementation: packing row panel of \(B_{p,j} \)

We briefly discuss the packing of the row panel \(B_{p,j} \) into \(\widetilde B_{p,j} \text{:}\)

We break the implementation, in Assignments/Week3/C/PackB.c, down into two routines. The first loops over all the panels that need to be packed

as illustrated in Figure 3.3.2.

void PackPanelB_KCxNC( int k, int n, double *B, int ldB,
	    double *Btilde ) 
/* Pack a KC x NC panel of B.  NC is assumed to be a multiple of NR.  
   The block is into Btilde a micro-panel at a time. */
  for ( int j=0; j<n; j+= NR ){
    int jb = min( NR, n-j );
    PackMicro-PanelB_KCxNR( k, jb, &beta( 0, j ), ldB, Btilde );
    Btilde += k * jb;

Figure 3.3.2. A reference implementation for packing \(B_{p,j} \text{.}\)

That routine then calls a routine that pack the panel

Given in Figure 3.3.3.

void PackMicroPanelB_KCxNR( int k, int n, double *B, int ldB,
	    double *Btilde )
/* Pack a micro-panel of B into buffer pointed to by Btilde.
   This is an unoptimized implementation for general KC and NR.

   k is assumed to be less then or equal to KC.
   n is assumed to be less then or equal to NR.  */
  /* March through B in row-major order, packing into Btilde. */
  if ( n == NR ) {
    /* Full column width micro-panel.*/
    for ( int p=0; p<k; p++ )
      for ( int j=0; j<NR; j++ )
        *Btilde++ = beta( p, j );
  else {
    /* Not a full row size micro-panel. We pad with zeroes.
     To be added */

Figure 3.3.3. A reference implementation for packing a micro-panel of \(B_{p,j} \text{.}\)
Remark 3.3.4.

Notice that these routines only work when the sizes are "nice". We leave it as a challenge to generalize all implementations so that matrix-matrix multiplication with arbitrary problem sizes works. To manage the complexity of this, we recommend "padding" the matrices with zeroes as they are being packed. This then keeps the micro-kernel simple.