Communication costs of Strassen's matrix multiplication

Authors:
Grey Ballard;James Demmel;Olga Holtz;Oded Schwartz
Affiliations:
University of California, Berkeley, CA;University of California, Berkeley, CA;Technische Universitat Berlin, Germany;University of California, Berkeley, CA
Venue:
Communications of the ACM
Year:
2014

Citing 19
Cited 0

LAPACK's user's guide

LAPACK's user's guide
A three-dimensional approach to parallel matrix multiplication

IBM Journal of Research and Development
I/O complexity: The red-blue pebble game

STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
A cellular computer to implement the kalman filter algorithm

A cellular computer to implement the kalman filter algorithm
Communication lower bounds for distributed-memory matrix multiplication

Journal of Parallel and Distributed Computing
Fast linear algebra is stable

Numerische Mathematik
An elementary construction of constant-degree expanders

Combinatorics, Probability and Computing
Graph expansion and communication costs of fast matrix multiplication: regular submission

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
The Future of Computing Performance: Game Over or Next Level?

The Future of Computing Performance: Game Over or Next Level?
Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Multiplying matrices faster than coppersmith-winograd

STOC '12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing
Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Communication-optimal parallel algorithm for strassen's matrix multiplication

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Communication-avoiding parallel strassen: implementation and performance

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Graph expansion and communication costs of fast matrix multiplication

Journal of the ACM (JACM)
Graph expansion analysis for communication costs of fast rectangular matrix multiplication

MedAlg'12 Proceedings of the First Mediterranean conference on Design and Analysis of Algorithms
Communication optimal parallel multiplication of sparse random matrices

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication

IPDPS '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
Perfect Strong Scaling Using No Additional Energy

IPDPS '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing

Quantified Score

Hi-index	48.22

Visualization

Abstract

Algorithms have historically been evaluated in terms of the number of arithmetic operations they performed. This analysis is no longer sufficient for predicting running times on today's machines. Moving data through memory hierarchies and among processors requires much more time (and energy) than performing computations. Hardware trends suggest that the relative costs of this communication will only increase. Proving lower bounds on the communication of algorithms and finding algorithms that attain these bounds are therefore fundamental goals. We show that the communication cost of an algorithm is closely related to the graph expansion properties of its corresponding computation graph. Matrix multiplication is one of the most fundamental problems in scientific computing and in parallel computing. Applying expansion analysis to Strassen's and other fast matrix multiplication algorithms, we obtain the first lower bounds on their communication costs. These bounds show that the current sequential algorithms are optimal but that previous parallel algorithms communicate more than necessary. Our new parallelization of Strassen's algorithm is communication-optimal and outperforms all previous matrix multiplication algorithms.