Cache-sensitive MapReduce DGEMM algorithms for shared memory architectures

  • Authors:
  • Gideon Nimako;E. J. Otoo;Daniel Ohene-Kwofie

  • Affiliations:
  • The University of the Witwatersrand, Johannesburg, South Africa;The University of the Witwatersrand, Johannesburg, South Africa;The University of the Witwatersrand, Johannesburg, South Africa

  • Venue:
  • Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Parallelism in linear algebra libraries is a common approach to accelerate numerical and scientific applications. Matrix-matrix multiplication is one of the most widely used computations in scientific and numerical algorithms. Although a number of matrix multiplication algorithms exist for distributed memory environments (e.g., Cannon, Fox, PUMMA, SUMMA), matrix-matrix multiplication algorithms for shared memory and SMP architectures have not been extensively studied. In this paper, we present a fast matrix-matrix multiplication algorithm for multi-core and SMP architectures using the MapReduce framework. Memory-resident linear algebra algorithms suffer performance losses on modern multi-core architectures because of the increasing performance gap between the CPU and main memory. To allow such compute-intensive algorithms to exploit the full potential of the program's inherent instruction level parallelism, the adverse effect of the processor-memory performance gap should be minimized. We present a cache-sensitive MapReduce matrix multiplication algorithm that fully exploits memory bandwidth and minimize cache misses and conflicts. Our experimental results show that the two algorithms outperform existing matrix multiplication algorithms for shared-memory architectures such as those given in the Phoenix, PLASMA and LAPACK libraries.