A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
IBM Journal of Research and Development
SUMMA: Scalable Universal Matrix Multiplication Algorithm
SUMMA: Scalable Universal Matrix Multiplication Algorithm
A cellular computer to implement the kalman filter algorithm
A cellular computer to implement the kalman filter algorithm
Heterogeneous Networks of Workstations and the Parallel Matrix Multiplication
Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Hi-index | 0.00 |
We present a new parallel matrix multiplication algorithm on distributed memory concurrent computers, which is fast and scalable, and whose performance is independent of data distribution on processors, and call it DIMMA (Distribution-Independent Matrix Multiplication Algorithm). The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectively, and exploits the LCM block concept to obtain the maximum performance of the sequential BLAS routine in each processor even when the block size is very small as well as very large. The algorithm is implemented and compared with SUMMA on the Intel Paragon computer.