Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds

Authors:
Grey Ballard;James Demmel;Olga Holtz;Benjamin Lipshitz;Oded Schwartz
Affiliations:
UC Berkeley, Berkeley, CA, USA;UC Berkeley, Berkeley, CA, USA;UC Berkeley and TU Berlin, Berkeley, CA, USA;UC Berkeley, Berkeley, CA, USA;UC Berkeley, Berkeley, CA, USA
Venue:
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Year:
2012

Citing 5
Cited 6

Communication complexity of PRAMs

Theoretical Computer Science - Special issue: Fifteenth international colloquium on automata, languages and programming, Tampere, Finland, July 1988
Communication lower bounds for distributed-memory matrix multiplication

Journal of Parallel and Distributed Computing
Graph expansion and communication costs of fast matrix multiplication: regular submission

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Communication-optimal parallel algorithm for strassen's matrix multiplication

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures

Communication-optimal parallel algorithm for strassen's matrix multiplication

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Communication-avoiding parallel strassen: implementation and performance

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Graph expansion and communication costs of fast matrix multiplication

Journal of the ACM (JACM)
Graph expansion analysis for communication costs of fast rectangular matrix multiplication

MedAlg'12 Proceedings of the First Mediterranean conference on Design and Analysis of Algorithms
Communication optimal parallel multiplication of sparse random matrices

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Communication costs of Strassen's matrix multiplication

Communications of the ACM

Quantified Score

Hi-index	0.02

Visualization

Abstract

A parallel algorithm has perfect strong scaling if its running time on $P$ processors is linear in $1/P$, including all communication costs. Distributed-memory parallel algorithms for matrix multiplication with perfect strong scaling have only recently been found. One is based on classical matrix multiplication (Solomonik and Demmel, 2011), and one is based on Strassen's fast matrix multiplication (Ballard, Demmel, Holtz, Lipshitz, and Schwartz, 2012). Both algorithms scale perfectly, but only up to some number of processors where the inter-processor communication no longer scales. We obtain a memory-independent communication cost lower bound on classical and Strassen-based distributed-memory matrix multiplication algorithms. These bounds imply that no classical or Strassen-based parallel matrix multiplication algorithm can strongly scale perfectly beyond the ranges already attained by the two parallel algorithms mentioned above. The memory-independent bounds and the strong scaling bounds generalize to other algorithms.