A family of high-performance matrix multiplication algorithms

  • Authors:
  • John A. Gunnels;Fred G. Gustavson;Greg M. Henry;Robert A. van de Geijn

  • Affiliations:
  • IBM T.J. Watson Research Center;IBM T.J. Watson Research Center;Intel Corporation;The University of Texas, Austin

  • Venue:
  • PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a model of hierarchical memories and we use it to determine an optimal strategy for blocking operand matrices of matrix multiplication. The model is an extension of an earlier related model by three of the authors. As before the model predicts the form of current, state-of-the-art L1 kernels. Additionally, it shows that current L1 kernels can continue to produce their high performance on operand matrices that are as large as the L2 cache. For a hierarchical memory with L memory levels (main memory and L-1 caches), our model reduces the number of potential matrix multiply algorithms from 6L to four. We use the shape of the matrix input operands to select one of our four algorithms. Previously four was 2L and the model was independent of the matrix operand shapes. Because of space limitations, we do not include performance results.