Start by considering the FORTRAN coding of the BLAS routine dgemm
For this routine, I have gone through the following steps:
but still in FORTRAN. The result is the routine dgemm_notrans_notrans, given in
Notice that by typing
yes | rm *.oone creates a version that does not use an optimizer, where typing
make driver
yes | rm *.ocreates a version that uses optimization level -O 3.
make driver3
Some performance numbers (in MFLOPS):
place HR here