The aggregation and cancellation techniques as a practical tool for faster matrix multiplication

Authors:
Igor Kaporin
Affiliations:
Computational Center of the Russian Academy of Sciences, Vavilova 40, 119991 Moscow, Russia
Venue:
Theoretical Computer Science - Algebraic and numerical algorithm
Year:
2004

Citing 15
Cited 13

How can we speed up matrix multiplication?

SIAM Review
Extra high speed matrix multiplication on the Cray-2

SIAM Journal on Scientific and Statistical Computing
Matrix multiplication via arithmetic progressions

Journal of Symbolic Computation - Special issue on computational algebraic complexity
Exploiting fast matrix multiplication within the level 3 BLAS

ACM Transactions on Mathematical Software (TOMS)
Stability of block algorithms with fast level-3 BLAS

ACM Transactions on Mathematical Software (TOMS)
GEMMW: a portable level 3 BLAS Winograd variant of Strassen's matrix-matrix multiply algorithm

Journal of Computational Physics
Fast rectangular matrix multiplication and applications

Journal of Complexity
Approximating matrix multiplication for pattern recognition tasks

Journal of Algorithms
Tuning Strassen's matrix multiplication for memory efficiency

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Accuracy and Stability of Numerical Algorithms

Accuracy and Stability of Numerical Algorithms
Introduction To Automata Theory, Languages, And Computation

Introduction To Automata Theory, Languages, And Computation
The Design and Analysis of Computer Algorithms

The Design and Analysis of Computer Algorithms
All Pairs Shortest Paths in weighted directed graphs ? exact and almost exact algorithms

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Algorithms for matrix multiplication

Algorithms for matrix multiplication
LAPPACK Working Note No. 28: The IBM RISC System/6000 and Linear Algebra Operations

LAPPACK Working Note No. 28: The IBM RISC System/6000 and Linear Algebra Operations

Adaptive Strassen's matrix multiplication

Proceedings of the 21st annual international conference on Supercomputing
The schur aggregation for solving linear systems of equations

Proceedings of the 2007 international workshop on Symbolic-numeric computation
Null space and eigenspace computations with additive preprocessing

Proceedings of the 2007 international workshop on Symbolic-numeric computation
Solving toeplitz- and vandermonde-like linear systems with large displacement rank

Proceedings of the 2007 international symposium on Symbolic and algebraic computation
Additive preconditioning and aggregation in matrix computations

Computers & Mathematics with Applications
Two Dimensional Aggregation Procedure: An Alternative to the Matrix Algebraic Algorithm

Computational Economics
Products of ordinary differential operators by evaluation and interpolation

Proceedings of the twenty-first international symposium on Symbolic and algebraic computation
Dense Linear Algebra over Word-Size Prime Fields: the FFLAS and FFPACK Packages

ACM Transactions on Mathematical Software (TOMS)
Solving structured linear systems with large displacement rank

Theoretical Computer Science
Adaptive Winograd's matrix multiplications

ACM Transactions on Mathematical Software (TOMS)
Algebraic and numerical algorithms

Algorithms and theory of computation handbook
Optimization techniques for small matrix multiplication

Theoretical Computer Science
Exploiting parallelism in matrix-computation kernels for symmetric multiprocessor systems: Matrix-multiplication and matrix-addition algorithm optimizations by software pipelining and threads allocation

ACM Transactions on Mathematical Software (TOMS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The main purpose of this paper is to present a fast matrix multiplication algorithm taken from the paper of Laderman et al. (Linear Algebra Appl. 162-164 (1992) 557) in a refined compact "analytical" form and to demonstrate that it can be implemented as quite efficient computer code. Our improved presentation enables us to simplify substantially the analysis of the computational complexity and numerical stability of the algorithm as well as its computer implementation. The algorithm multiplies two N × N matrices using O(N2.7760) arithmetic operations. In the case where N = 18 ċ 48k, for a positive integer k, the total number of flops required by the algorithm is 4.894N2.7760 - 16.165N2, which may be compared to a similar estimate for the Winograd algorithm, 3.732N2.8074 - 5N2 flops, N = 8 ċ 2k, the latter being current record bound among all known practical algorithms. Moreover, we present a pseudo-code of the algorithm which demonstrates its very moderate working memory requirements, much smaller than that of the best available implementations of Strassen and Winograd algorithms. For matrices of medium-large size (say, 2000 ≤ N