Fine tuning matrix multiplications on multicore

  • Authors:
  • Stéphane Zuckerman;Marc Pérache;William Jalby

  • Affiliations:
  • LRC, ITACA, University of Versailles and CEA, DAM;LRC, ITACA, University of Versailles and CEA, DAM;LRC, ITACA, University of Versailles and CEA, DAM

  • Venue:
  • HiPC'08 Proceedings of the 15th international conference on High performance computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multicore systems are becoming ubiquituous in scientificcomputing. As performance libraries are adapted to such systems, thedifficulty to extract the best performance out of them is quite high. Indeed,performance libraries such as Intel's MKL, while performing verywell on unicore architectures, see their behaviour degrade when used onmulticore systems. Moreover, even multicore systems show wide differencesamong each other (presence of shared caches, memory bandwidth,etc.) We propose a systematic method to improve the parallel executionof matrix multiplication, through the study of the behavior of unicoreDGEMM kernels in MKL, as well as various other criteria. We show thatour fine-tuning can out-perform Intel's parallel DGEMM of MKL, withperformance gains sometimes up to a factor of two.