Algorithm 898: Efficient multiplication of dense matrices over GF(2)

Authors:
Martin Albrecht;Gregory Bard;William Hart
Affiliations:
Royal Holloway, University of London, United Kingdom;Fordham University, Bronx, NY;University of Warwick, United Kingdom
Venue:
ACM Transactions on Mathematical Software (TOMS)
Year:
2010

Citing 6
Cited 2

The MAGMA algebra system I: the user language

Journal of Symbolic Computation - Special issue on computational algebra and number theory: proceedings of the first MAGMA conference
Implementation of Strassen's algorithm for matrix multiplication

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Hacker's Delight

Hacker's Delight
The Design and Analysis of Computer Algorithms

The Design and Analysis of Computer Algorithms
Accuracy and Stability of Numerical Algorithms

Accuracy and Stability of Numerical Algorithms
Algorithms for solving linear and polynomial systems of equations over finite fields, with applications to cryptanalysis

Algorithms for solving linear and polynomial systems of equations over finite fields, with applications to cryptanalysis

The M4RIE library for dense linear algebra over small fields with even characteristic

Proceedings of the 37th International Symposium on Symbolic and Algebraic Computation
Fast matrix decomposition in F2

Journal of Computational and Applied Mathematics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe an efficient implementation of a hierarchy of algorithms for multiplication of dense matrices over the field with two elements (F2). In particular we present our implementation—in the M4RI library—of Strassen-Winograd matrix multiplication and the “Method of the Four Russians for Multiplication” (M4RM) and compare it against other available implementations. Good performance is demonstrated on AMD's Opteron processor and particulary good performance on Intel's Core 2 uo processor. The open-source M4RI library is available as a stand-alone package as well as part of the Sage mathematics system. In machine terms, addition in F2 is logical-XOR, and multiplication is logical-AND, thus a machine word of 64 bits allows one to operate on 64 elements of F2 in parallel: at most one CPU cycle for 64 parallel additions or multiplications. As such, element-wise operations over F2 are relatively cheap. In fact, in this paper, we conclude that the actual bottlenecks are memory reads and writes and issues of data locality. We present our empirical findings in relation to minimizing these and give an analysis thereof.