GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark

  • Authors:
  • Bo Kågström;Per Ling;Charles van Loan

  • Affiliations:
  • Umeå Univ., Umeå, Sweden;Umeå Univ., Umeå, Sweden;Cornell Univ., Ithaca, NY

  • Venue:
  • ACM Transactions on Mathematical Software (TOMS)
  • Year:
  • 1998

Quantified Score

Hi-index 0.01

Visualization

Abstract

The level 3 Basic Linear Algebra Subprograms (BLAS) are designed to perform various matrix multiply and triangular system solving computations. Due to the complex hardware organization of advanced computer architectures the development of optimal level 3 BLAS code is costly and time consuming. However, it is possible to develop a portable and high-performance level 3 BLAS library mainly relying on a highly optimized GEMM, the routine for the general matrix multiply and add operation. With suitable partitioning, all the other level 3 BLAS can be defined in terms of GEMM and a small amount of level 1 and level 2 computations. Our contribution is twofold. First, the model implementations in Fortran 77 of the GEMM-based level 3 BLAS are structured to reduced effectively data traffic in a memory hierarchy. Second, the GEMM-based level 3 BLAS performance evaluation benchmark is a tool for evaluating and comparing different implementations of the level 3 BLAS with the GEMM-based model implementations.