Anatomy of high-performance matrix multiplication

  • Authors:
  • Kazushige Goto;Robert A. van de Geijn

  • Affiliations:
  • The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX

  • Venue:
  • ACM Transactions on Mathematical Software (TOMS)
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

We present the basic principles that underlie the high-performance implementation of the matrix-matrix multiplication that is part of the widely used GotoBLAS library. Design decisions are justified by successively refining a model of architectures with multilevel memories. A simple but effective algorithm for executing this operation results. Implementations on a broad selection of architectures are shown to achieve near-peak performance.