Implementation of Strassen's algorithm for matrix multiplication

Authors:
Steven Huss-Lederman;Elaine M. Jacobson;Anna Tsao;Thomas Turnbull;Jeremy R. Johnson
Affiliations:
Computer Sciences Department, University of Wisconsin-Madison, 1210 W. Dayton St., Madison, WI;Center for Computing Sciences, 17100 Science Dr., Bowie, MD;Center for Computing Sciences, 17100 Science Dr., Bowie, MD;Center for Computing Sciences, 17100 Science Dr., Bowie, MD;Department of Mathematics and Computer Science, Drexel University, Philadelphia, PA
Venue:
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Year:
1996

Citing 12
Cited 20

Extra high speed matrix multiplication on the Cray-2

SIAM Journal on Scientific and Statistical Computing
A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Introduction to algorithms

Introduction to algorithms
Exploiting fast matrix multiplication within the level 3 BLAS

ACM Transactions on Mathematical Software (TOMS)
Using Strassen's algorithm to accelerate the solution of linear systems

The Journal of Supercomputing
LAPACK's user's guide

LAPACK's user's guide
Stability of block algorithms with fast level-3 BLAS

ACM Transactions on Mathematical Software (TOMS)
GEMMW: a portable level 3 BLAS Winograd variant of Strassen's matrix-matrix multiply algorithm

Journal of Computational Physics
A Parallelizable Eigensolver for Real Diagonalizable Matrices with Real Eigenvalues

SIAM Journal on Scientific Computing
Efficient Procedures for Using Matrix Algorithms

Proceedings of the 2nd Colloquium on Automata, Languages and Programming
Further Schemes for Combining Matrix Algorithms

Proceedings of the 2nd Colloquium on Automata, Languages and Programming
Algorithms for matrix multiplication

Algorithms for matrix multiplication

Tuning Strassen's matrix multiplication for memory efficiency

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Recursive Array Layouts and Fast Matrix Multiplication

IEEE Transactions on Parallel and Distributed Systems
Finite field linear algebra subroutines

Proceedings of the 2002 international symposium on Symbolic and algebraic computation
Weak minimization of DFA: an algorithm and applications

Theoretical Computer Science - Implementation and application of automata
Statistical Models for Empirical Search-Based Performance Tuning

International Journal of High Performance Computing Applications
Adaptive Strassen and ATLAS's DGEMM: A Fast Square-Matrix Multiply for Modern High-Performance Systems

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Adaptive Strassen's matrix multiplication

Proceedings of the 21st annual international conference on Supercomputing
Combining building blocks for parallel multi-level matrix multiplication

Parallel Computing
Dense Linear Algebra over Word-Size Prime Fields: the FFLAS and FFPACK Packages

ACM Transactions on Mathematical Software (TOMS)
Adaptive Winograd's matrix multiplications

ACM Transactions on Mathematical Software (TOMS)
Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm

Proceedings of the 2009 international symposium on Symbolic and algebraic computation
Algorithm 898: Efficient multiplication of dense matrices over GF(2)

ACM Transactions on Mathematical Software (TOMS)
Using recursion to boost ATLAS's performance

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Graph expansion and communication costs of fast matrix multiplication: regular submission

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Exploiting parallelism in matrix-computation kernels for symmetric multiprocessor systems: Matrix-multiplication and matrix-addition algorithm optimizations by software pipelining and threads allocation

ACM Transactions on Mathematical Software (TOMS)
A data locality methodology for matrix---matrix multiplication algorithm

The Journal of Supercomputing
FFT-based dense polynomial arithmetic on multi-cores

HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications
Efficient calculation of the gauss-newton approximation of the hessian matrix in neural networks

Neural Computation
Graph expansion and communication costs of fast matrix multiplication

Journal of the ACM (JACM)
Fast matrix decomposition in F2

Journal of Computational and Applied Mathematics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we report on the development of an efficient and portable implementation of Strassen's matrix multiplication algorithm for matrices of arbitrary size. Our technique for defining the criterion which stops the recursions is more detailed than those generally used, thus allowing enhanced performance for a larger set of input sizes. In addition, we deal with odd matrix dimensions using a method whose usefulness had previously been in question and had not so far been demonstrated. Our memory requirements have also been reduced, in certain cases by 40 to more than 70 percent over other similar implementations. We measure performance of our code on the IBM RS/6000, CRAY YMP C90, and CRAY T3D single processor, and offer comparisons to other codes. Finally, we demonstrate the usefulness of our implementation by using it to perform the matrix multiplications in a large application code.