Strassen's Matrix Multiplication on GPUs

  • Authors:
  • Junjie Li;Sanjay Ranka;Sartaj Sahni

  • Affiliations:
  • -;-;-

  • Venue:
  • ICPADS '11 Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We provide efficient single-precision and integer GPU implementations of Strassen's algorithm as well as of Winograd's variant. On an NVIDIA C1060 GPU, a speedup of 32% (35%) is obtained for Strassen's 4-level implementation and 33% (36%) for Winograd's variant relative to the sgemm (integer version of sgemm) code in CUBLAS 3.0 when multiplying 16384脳16384 matrices. The maximum numerical error for the single-precision implementations is about 2 orders of magnitude higher than those for sgemm when n = 16384 and is zero for the integer implementations.