Parallel and distributed computation: numerical methods
Parallel and distributed computation: numerical methods
Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
A cellular computer to implement the kalman filter algorithm
A cellular computer to implement the kalman filter algorithm
Modeling parallel bandwidth: local vs. global restrictions
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Memory efficient parallel matrix multiplication operation for irregular problems
Proceedings of the 3rd conference on Computing frontiers
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Hi-index | 0.01 |
In this paper we present an efficient dense matrix multiplication algorithm for distributed memory computers with a hypercube topology. The proposed algorithm performs better than all previously proposed algorithms for a wide range of matrix sizes and number of processors, especially for large matrices. We analyze the performance of the algorithms for two types of hypercube architectures, one in which each node can use (to send and receive) at most one communication link at a time and the other in which each node can use all communication links simultaneously.