Unit 4.1.3 What you will learn

In this week, we discover how to parallelize matrix-matrix multiplication among multiple cores of a processor.

Upon completion of this week, we will be able to

  • Exploit multiple cores by multithreading your implementation.

  • Direct the compiler to parallelize code sections with OpenMP.

  • Parallelize the different loops and interpret the resulting performance.

  • Experience when loops can be more easily parallelized and when more care must be taken.

  • Apply the concepts of speedup and efficiency to implementations of matrix-matrix multiplication.

  • Analyze limitations on parallel efficiency due to Ahmdahl's law.

The enrichments introduce us to

  • The casting of other linear algebra operations in terms of matrix-matrix multiplication.

  • The benefits of having a family of algorithms for a specific linear algebra operation and where to learn how to systematically derive such a family.

  • Operations that resemble matrix-matrix multiplication that are encountered in Machine Learning, allowing the techniques to be extended.

  • Parallelizing matrix-matrix multiplication for distributed memory architectures.

  • Applying the learned techniques to the implementation of matrix-matrix multiplication on GPUs.