Combining building blocks for parallel multi-level matrix multiplication

  • Authors:
  • S. Hunold;T. Rauber;G. Rünger

  • Affiliations:
  • Department of Mathematics, Physics, and Computer Science, University of Bayreuth, Germany;Department of Mathematics, Physics, and Computer Science, University of Bayreuth, Germany;Department of Computer Science, Chemnitz University of Technology, Germany

  • Venue:
  • Parallel Computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents parallel algorithms for matrix-matrix multiplication which are built up from several algorithms in a multi-level structure. The upper level consists of Strassen's algorithm which is performed for a predefined number of recursions. The number of recursions can be adapted to the specific execution platform. The intermediate level is performed by a parallel non-hierarchical algorithm and the lower level uses efficient one-processor implementations of matrix-matrix multiplication like BLAS or ATLAS. Both the number of recursions of Strassen's algorithm and the specific algorithms of the intermediate and lower level can be chosen so that a variety of different multi-level algorithms results. Each level of the multi-level algorithms is associated with a hierarchical partition of the set of available processors into disjoint subsets so that deeper levels of the algorithm employ smaller groups of processors in parallel. The algorithms are expressed in the multiprocessor task programming model and are coded with the runtime library Tlib. Performance experiments on several parallel platforms show that the multi-level algorithms can lead to significant performance gains.