Communication complexity of PRAMs
Theoretical Computer Science - Special issue: Fifteenth international colloquium on automata, languages and programming, Tampere, Finland, July 1988
Communication lower bounds for distributed-memory matrix multiplication
Journal of Parallel and Distributed Computing
Graph expansion and communication costs of fast matrix multiplication: regular submission
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Communication-optimal parallel algorithm for strassen's matrix multiplication
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Communication-optimal parallel algorithm for strassen's matrix multiplication
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Communication-avoiding parallel strassen: implementation and performance
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Graph expansion and communication costs of fast matrix multiplication
Journal of the ACM (JACM)
Graph expansion analysis for communication costs of fast rectangular matrix multiplication
MedAlg'12 Proceedings of the First Mediterranean conference on Design and Analysis of Algorithms
Communication optimal parallel multiplication of sparse random matrices
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Communication costs of Strassen's matrix multiplication
Communications of the ACM
Hi-index | 0.02 |
A parallel algorithm has perfect strong scaling if its running time on $P$ processors is linear in $1/P$, including all communication costs. Distributed-memory parallel algorithms for matrix multiplication with perfect strong scaling have only recently been found. One is based on classical matrix multiplication (Solomonik and Demmel, 2011), and one is based on Strassen's fast matrix multiplication (Ballard, Demmel, Holtz, Lipshitz, and Schwartz, 2012). Both algorithms scale perfectly, but only up to some number of processors where the inter-processor communication no longer scales. We obtain a memory-independent communication cost lower bound on classical and Strassen-based distributed-memory matrix multiplication algorithms. These bounds imply that no classical or Strassen-based parallel matrix multiplication algorithm can strongly scale perfectly beyond the ranges already attained by the two parallel algorithms mentioned above. The memory-independent bounds and the strong scaling bounds generalize to other algorithms.