High-performance implementation of the level-3 BLAS

  • Authors:
  • Kazushige Goto;Robert Van De Geijn

  • Affiliations:
  • The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX

  • Venue:
  • ACM Transactions on Mathematical Software (TOMS)
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

A simple but highly effective approach for transforming high-performance implementations on cache-based architectures of matrix-matrix multiplication into implementations of other commonly used matrix-matrix computations (the level-3 BLAS) is presented. Exceptional performance is demonstrated on various architectures.