Matrix multiplication via arithmetic progressions
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
The input/output complexity of sorting and related problems
Communications of the ACM
Matrix multiplication via arithmetic progressions
Journal of Symbolic Computation - Special issue on computational algebraic complexity
Communication complexity of PRAMs
Theoretical Computer Science - Special issue: Fifteenth international colloquium on automata, languages and programming, Tampere, Finland, July 1988
LAPACK's user's guide
GEMMW: a portable level 3 BLAS Winograd variant of Strassen's matrix-matrix multiply algorithm
Journal of Computational Physics
A three-dimensional approach to parallel matrix multiplication
IBM Journal of Research and Development
ScaLAPACK user's guide
Locality of Reference in LU Decomposition with Partial Pivoting
SIAM Journal on Matrix Analysis and Applications
Recursion leads to automatic variable blocking for dense linear-algebra algorithms
IBM Journal of Research and Development
Implementation of Strassen's algorithm for matrix multiplication
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Optimizing Graph Algorithms for Improved Cache Performance
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems
PARA '98 Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems
Ahnentafel Indexing into Morton-Ordered Arrays, or Matrix Locality for Free
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Automatic Generation of Block-Recursive Codes
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Extending the Hong-Kung Model to Memory Hierarchies
COCOON '95 Proceedings of the First Annual International Conference on Computing and Combinatorics
On the Space and Access Complexity of Computation DAGs
WG '00 Proceedings of the 26th International Workshop on Graph-Theoretic Concepts in Computer Science
QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Space-Time Tradeoffs in Memory Hierarchies
Space-Time Tradeoffs in Memory Hierarchies
A cellular computer to implement the kalman filter algorithm
A cellular computer to implement the kalman filter algorithm
On the Complexity of Matrix Product
SIAM Journal on Computing
Communication lower bounds for distributed-memory matrix multiplication
Journal of Parallel and Distributed Computing
Concurrency and Computation: Practice & Experience
Group-theoretic Algorithms for Matrix Multiplication
FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Cache-oblivious dynamic programming
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Communication-efficient parallel generic pairwise elimination
Future Generation Computer Systems - Special section: Information engineering and enterprise architecture in distributed computing environments
Fast matrix multiplication is stable
Numerische Mathematik
Numerische Mathematik
Provably good multicore cache performance for divide-and-conquer algorithms
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
An elementary construction of constant-degree expanders
Combinatorics, Probability and Computing
Conductance and convergence of Markov chains-a combinatorial treatment of expanders
SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Benchmarking GPUs to tune dense linear algebra
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model
Theory of Computing Systems - Special Title: Parallelism on Algorithms and Architectures (SPAA); Guest Editors: Cyril Gavoille, Boaz Patt-Shamir and Christian Scheideler
Algebraic Complexity Theory
Graph expansion and communication costs of fast matrix multiplication: regular submission
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Brief announcement: communication bounds for heterogeneous architectures
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
The Future of Computing Performance: Game Over or Next Level?
The Future of Computing Performance: Game Over or Next Level?
Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Communication-optimal Parallel and Sequential Cholesky Decomposition
SIAM Journal on Scientific Computing
Multiplying matrices faster than coppersmith-winograd
STOC '12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Communication-optimal parallel algorithm for strassen's matrix multiplication
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Communication-optimal Parallel and Sequential QR and LU Factorizations
SIAM Journal on Scientific Computing
CALU: A Communication Optimal LU Factorization Algorithm
SIAM Journal on Matrix Analysis and Applications
Graph expansion analysis for communication costs of fast rectangular matrix multiplication
MedAlg'12 Proceedings of the First Mediterranean conference on Design and Analysis of Algorithms
Graph expansion analysis for communication costs of fast rectangular matrix multiplication
MedAlg'12 Proceedings of the First Mediterranean conference on Design and Analysis of Algorithms
AUGEM: automatically generate high performance dense linear algebra kernels on x86 CPUs
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Tight bounds for low dimensional star stencils in the external memory model
WADS'13 Proceedings of the 13th international conference on Algorithms and Data Structures
Communication costs of Strassen's matrix multiplication
Communications of the ACM
Hi-index | 0.02 |
The communication cost of algorithms (also known as I/O-complexity) is shown to be closely related to the expansion properties of the corresponding computation graphs. We demonstrate this on Strassen's and other fast matrix multiplication algorithms, and obtain the first lower bounds on their communication costs. In the sequential case, where the processor has a fast memory of size M, too small to store three n-by-n matrices, the lower bound on the number of words moved between fast and slow memory is, for a large class of matrix multiplication algorithms, Ω( (n/√M)ω0 ·M), where ω0 is the exponent in the arithmetic count (e.g., ω0 = lg 7 for Strassen, and ω0 = 3 for conventional matrix multiplication). With p parallel processors, each with fast memory of size M, the lower bound is asymptotically lower by a factor of p. These bounds are attainable both for sequential and for parallel algorithms and hence optimal.