A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
IBM Journal of Research and Development
LAPACK Working Note 96: Scalable Universal Matrix Multiplication Algorithm
LAPACK Working Note 96: Scalable Universal Matrix Multiplication Algorithm
A cellular computer to implement the kalman filter algorithm
A cellular computer to implement the kalman filter algorithm
64-bit floating-point FPGA matrix multiplication
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Memory efficient parallel matrix multiplication operation for irregular problems
Proceedings of the 3rd conference on Computing frontiers
Parallelization of divide-and-conquer eigenvector accumulation
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
The author presents a fast and scalable matrix multiplication algorithm on distributed memory concurrent computers, whose performance is independent of data distribution on processors, and call it DIMMA (distribution-independent matrix multiplication algorithm). The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectively, and exploits the LCM block concept to obtain the maximum performance of the sequential BLAS routine in each processor when the block size is too small as well as too large. The algorithm is implemented and compared with SUMMA on the Intel Paragon computer.