A parallel block implementation of Level-3 BLAS for MIMD vector processors

  • Authors:
  • Michel J. Daydé;Iain S. Duff;Antoine Petitet

  • Affiliations:
  • ENSEEIHT-IRIT, Toulouse, France;CERFACS, Toulouse, France;CERFACS, Toulouse, France

  • Venue:
  • ACM Transactions on Mathematical Software (TOMS)
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe an implementation of Level-3 BLAS (Basic Linear Algebra Subprograms) based on the use of the matrix-matrix multiplication kernel (GEMM). Blocking techniques are used to express the BLAS in terms of operations involving triangular blocks and calls to GEMM. A principal advantage of this approach is that most manufacturers provide at least an efficient serial version of GEMM so that our implementation can capture a significant percentage of the computer performance. A parameter which controls the blocking allows an efficient exploitation of the memory hierarchy of the various target computers. Furthermore, this blocked version of Level-3 BLAS is naturally parallel. We present results on the ALLIANT FX/80, the CONVEX C220, the CRAY-2, and the IBM 3090/VF. For GEMM, we always use the manufacturer-supplied versions. For the operations dealing with triangular blocks, we use assembler or tuned Fortran (using loop-unrolling) codes, depending on the efficiency of the available libraries.