Minimizing development and maintenance costs in supporting persistently optimized BLAS

  • Authors:
  • R. Clint Whaley;Antoine Petitet

  • Affiliations:
  • Computer Science Department, Florida State University, 167 Love Building, Tallahassee, FL 32306-4530, U.S.A.;SUN Microsystems, 42, Avenue d'Iena, 75016 Paris, France

  • Venue:
  • Software—Practice & Experience - Research Articles
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

The Basic Linear Algebra Subprograms (BLAS) define one of the most heavily used performance-critical APIs in scientific computing today. It has long been understood that the most important of these routines, the dense Level 3 BLAS, may be written efficiently given a highly optimized general matrix multiply routine. In this paper, however, we show that an even larger set of operations can be efficiently maintained using a much simpler matrix multiply kernel. Indeed, this is how our own project, ATLAS (which provides one of the most widely used BLAS implementations in use today), supports a large variety of performance-critical routines. Copyright © 2004 John Wiley & Sons, Ltd.