Optimizing Parallel Multiplication Operation for Rectangular and Transposed Matrices

  • Authors:
  • Manojkumar Krishnan;Jarek Nieplocha

  • Affiliations:
  • Pacific Northwest National Laboratory;Pacific Northwest National Laboratory

  • Venue:
  • ICPADS '04 Proceedings of the Parallel and Distributed Systems, Tenth International Conference
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In many applications, matrix multiplication involvesdifferent shapes of matrices. The shape of the matrix cansignificantly impact the performance of matrixmultiplication algorithm. This paper describes extensionsof the SRUMMA parallel matrix multiplication algorithm[1] to improve performance of transpose and rectangularmatrices. Our approach relies on a set of hybrid algorithmswhich are chosen based on the shape of matrices andtranspose operator involved. The algorithm exploitsperformance characteristics of clusters and shared memorysystems: it differs from the other parallel matrixmultiplication algorithms by the explicit use of sharedmemory and remote memory access (RMA) communicationrather than message passing. The experimental results onclusters and shared memory systems demonstrateconsistent performance advantages over pdgemm from theScaLAPACK parallel linear algebra package.