Fractal Matrix Multiplication: A Case Study on Portability of Cache Performance

  • Authors:
  • Gianfranco Bilardi;Paolo D'Alberto;Alexandru Nicolau

  • Affiliations:
  • -;-;-

  • Venue:
  • WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

The practical portability of a simple version of matrix multiplication is demonstrated.The multiplication algorithm is designed to exploit maximal and predictable locality at all levels of the memory hierarchy, with no a priori knowledge of the specific memory system organization for any particular machine.B y both simulations and execution on a number of platforms, we show that memory hierarchies portability does not sacrifice floating point performance; indeed, it is always a significant fraction of peak and, at least on one machine, is higher than the tuned routines by both ATLAS and vendor. The results are obtained by careful algorithm engineering, which combines a number of known as well as novel implementation ideas.This effort can be viewed as an experimental case study, complementary to the theoretical investigations on portability of cache performance begun by Bilardi and Peserico