Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication

Authors:
James Demmel;David Eliahu;Armando Fox;Shoaib Kamil;Benjamin Lipshitz;Oded Schwartz;Omer Spillinger
Affiliations:
-;-;-;-;-;-;-
Venue:
IPDPS '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
Year:
2013

Citing 0
Cited 2

Communication optimal parallel multiplication of sparse random matrices

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Communication costs of Strassen's matrix multiplication

Communications of the ACM

Quantified Score

Hi-index	0.02

Visualization

Abstract

Communication-optimal algorithms are known for square matrix multiplication. Here, we obtain the first communication-optimal algorithm for all dimensions of rectangular matrices. Combining the dimension-splitting technique of Frigo, Leiserson, Prokop and Ramachandran (1999) with the recursive BFS/DFS approach of Ballard, Demmel, Holtz, Lipshitz and Schwartz (2012) allows for a communication-optimal as well as cache- and network-oblivious algorithm. Moreover, the implementation is simple: approximately 50 lines of code for the shared-memory version. Since the new algorithm minimizes communication across the network, between NUMA domains, and between levels of cache, it performs well in practice on both shared- and distributed-memory machines. We show significant speedups over existing parallel linear algebra libraries both on a 32-core shared-memory machine and on a distributed-memory supercomputer.