Hardware-oriented implementation of cache oblivious matrix operations based on space-filling curves

  • Authors:
  • Michael Bader;Robert Franz;Stephan Günther;Alexander Heinecke

  • Affiliations:
  • Dept. of Informatics, TU München, München, Germany;Dept. of Informatics, TU München, München, Germany;Dept. of Informatics, TU München, München, Germany;Dept. of Informatics, TU München, München, Germany

  • Venue:
  • PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We will present hardware-oriented implementations of blockrecursive approaches for matrix operations, esp. matrix multiplication and LU decomposition. An element order based on a recursively constructed Peano space-filling curve is used to store the matrix elements. This block-recursive numbering scheme is changed into a standard rowmajor order, as soon as the respective matrix subblocks fit into level-1 cache. For operations on these small blocks, we implemented hardwareoriented kernels optimised for Intel's Core architecture. The resulting matrix-multiplication and LU-decomposition codes compete well with optimised libraries such as Intel's MKL, ATLAS, or GotoBLAS, but have the advantage that only comparably small and well-defined kernel operations have to be optimised to achieve high performance.