Fine tuning matrix multiplications on multicore

Authors:
Stéphane Zuckerman;Marc Pérache;William Jalby
Affiliations:
LRC, ITACA, University of Versailles and CEA, DAM;LRC, ITACA, University of Versailles and CEA, DAM;LRC, ITACA, University of Versailles and CEA, DAM
Venue:
HiPC'08 Proceedings of the 15th international conference on High performance computing
Year:
2008

Citing 5
Cited 1

Optimal matrix algorithms on homogeneous hypercubes

C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
A cellular computer to implement the kalman filter algorithm

A cellular computer to implement the kalman filter algorithm
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Anatomy of high-performance matrix multiplication

ACM Transactions on Mathematical Software (TOMS)
High-performance implementation of the level-3 BLAS

ACM Transactions on Mathematical Software (TOMS)

Performance study of matrix computations using multi-core programming tools

Proceedings of the Fifth Balkan Conference in Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multicore systems are becoming ubiquituous in scientificcomputing. As performance libraries are adapted to such systems, thedifficulty to extract the best performance out of them is quite high. Indeed,performance libraries such as Intel's MKL, while performing verywell on unicore architectures, see their behaviour degrade when used onmulticore systems. Moreover, even multicore systems show wide differencesamong each other (presence of shared caches, memory bandwidth,etc.) We propose a systematic method to improve the parallel executionof matrix multiplication, through the study of the behavior of unicoreDGEMM kernels in MKL, as well as various other criteria. We show thatour fine-tuning can out-perform Intel's parallel DGEMM of MKL, withperformance gains sometimes up to a factor of two.