Robert van de Geijn, Margaret Myers, Devangi Parikh
Unit3.3.6Micro-kernel with packed data
¶Figure3.3.8. Blocking for multiple levels of cache, with packing. Reference implementations of packing routines can be found in Figure 3.3.2, Figure 3.3.3, Figure 3.3.5, Figure 3.3.6. While these implementations can be optimized, the fact is that the cost when packing is in the data movement between main memory and faster memory. As a result, optimizing the packing has relatively little effect.
How to modify the five loops to incorporate packing is illustrated in Unit 3.3.5. A micro-kernel to compute with the packed data when \(m_R \times n_R = 4
\times 4 \) is illustrated in Figure 3.3.8.