Skip to main content

Unit 4.3.2 Parallelizing the first loop around the microkernel

One can similarly parallelize the first loop around the microkernel:

Homework 4.3.2.1.

In directory Week4/C,

  • Copy Gemm_Parallel_Loop2_12x4.c into Gemm_Parallel_Loop1_12x4.c.

  • Modify it so that only the first loop around the microkernel is parallelized.

  • Set the number of threads to some number between \(1 \) and the number of CPUs in the target processor.

  • Execute make Parallel_Loop1_12x4.

  • View the resulting performance with data/ShowPerformance.mlx, uncommenting the appropriate lines. (You should be able to do this so that you see previous performance curves as well.)

  • Be sure to check if you got the right answer!

  • How does the performance improve relative to the number of threads being used?