A Fast Scalable Universal Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers

  • Authors:
  • J. Choi

  • Affiliations:
  • -

  • Venue:
  • IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

The author presents a fast and scalable matrix multiplication algorithm on distributed memory concurrent computers, whose performance is independent of data distribution on processors, and call it DIMMA (distribution-independent matrix multiplication algorithm). The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectively, and exploits the LCM block concept to obtain the maximum performance of the sequential BLAS routine in each processor when the block size is too small as well as too large. The algorithm is implemented and compared with SUMMA on the Intel Paragon computer.