LAPACK's user's guide
A three-dimensional approach to parallel matrix multiplication
IBM Journal of Research and Development
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
A cellular computer to implement the kalman filter algorithm
A cellular computer to implement the kalman filter algorithm
Communication lower bounds for distributed-memory matrix multiplication
Journal of Parallel and Distributed Computing
Numerische Mathematik
An elementary construction of constant-degree expanders
Combinatorics, Probability and Computing
Graph expansion and communication costs of fast matrix multiplication: regular submission
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
The Future of Computing Performance: Game Over or Next Level?
The Future of Computing Performance: Game Over or Next Level?
Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Multiplying matrices faster than coppersmith-winograd
STOC '12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Communication-optimal parallel algorithm for strassen's matrix multiplication
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Communication-avoiding parallel strassen: implementation and performance
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Graph expansion and communication costs of fast matrix multiplication
Journal of the ACM (JACM)
Graph expansion analysis for communication costs of fast rectangular matrix multiplication
MedAlg'12 Proceedings of the First Mediterranean conference on Design and Analysis of Algorithms
Communication optimal parallel multiplication of sparse random matrices
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication
IPDPS '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
Perfect Strong Scaling Using No Additional Energy
IPDPS '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
Hi-index | 48.22 |
Algorithms have historically been evaluated in terms of the number of arithmetic operations they performed. This analysis is no longer sufficient for predicting running times on today's machines. Moving data through memory hierarchies and among processors requires much more time (and energy) than performing computations. Hardware trends suggest that the relative costs of this communication will only increase. Proving lower bounds on the communication of algorithms and finding algorithms that attain these bounds are therefore fundamental goals. We show that the communication cost of an algorithm is closely related to the graph expansion properties of its corresponding computation graph. Matrix multiplication is one of the most fundamental problems in scientific computing and in parallel computing. Applying expansion analysis to Strassen's and other fast matrix multiplication algorithms, we obtain the first lower bounds on their communication costs. These bounds show that the current sequential algorithms are optimal but that previous parallel algorithms communicate more than necessary. Our new parallelization of Strassen's algorithm is communication-optimal and outperforms all previous matrix multiplication algorithms.