## Unit3.3.6Micro-kernel with packed data

How to modify the five loops to incorporate packing was discussed in Unit 3.3.5. A micro-kernel to compute with the packed data when $m_R \times n_R = 4 \times 4$ is now illustrated in Figure 3.3.8.

###### Homework3.3.6.2.

Copy the file Gemm_4x4Kernel_Packed.c into file Gemm_12x4Kernel_Packed.c. Modify that file so that it uses $m_R \times n_R = 12 \times 4 \text{.}$ Test the result with

make Five_Loops_Packed_12x4Kernel


and view the resulting performance with Live Script Plot_Five_Loops.mlx.

Solution

Assignments/Week3/Answers/Gemm_12x4Kernel_Packed.c

On Robert's laptop:

Now we are getting somewhere!

###### Homework3.3.6.3.

In Homework 3.2.3.1, you determined the best block sizes MC and KC. Now that you have added packing to the implementation of the five loops around the micro-kernel, these parameters need to be revisited. You can collect data for a range of choices by executing

make Five_Loops_Packed_?x?Kernel_MCxKC


where ?x? is your favorite choice for register blocking. View the result with data/Plot_Five_loops.mlx.