Automatic calibration of performance models on heterogeneous multicore architectures

Authors:
Cédric Augonnet;Samuel Thibault;Raymond Namyst
Affiliations:
INRIA Bordeaux, LaBRI, University of Bordeaux;INRIA Bordeaux, LaBRI, University of Bordeaux;INRIA Bordeaux, LaBRI, University of Bordeaux
Venue:
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Year:
2009

Citing 8
Cited 2

Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing

IEEE Transactions on Parallel and Distributed Systems
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CellSs: a programming model for the cell BE architecture

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Predictive Runtime Code Scheduling for Heterogeneous Architectures

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
A Note on Auto-tuning GEMM for GPUs

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Extending the OpenMP tasking model to allow dependent tasks

IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism

Automatic performance model synthesis from hardware verification models

Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
A static task partitioning approach for heterogeneous systems using OpenCL

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multicore architectures featuring specialized accelerators are getting an increasing amount of attention, and this success will probably influence the design of future High Performance Computing hardware. Unfortunately, programmers are actually having a hard time trying to exploit all these heterogeneous computing units efficiently, and most existing efforts simply focus on providing tools to offload some computations on available accelerators. Recently, some runtime systems have been designed that exploit the idea of scheduling - as opposed to offloading - parallel tasks over the whole set of heterogeneous computing units. Scheduling tasks over heterogeneous platforms makes it necessary to use accurate prediction models in order to assign each task to its most adequate computing unit [2]. A deep knowledge of the application is usually required to model per-task performance models, based on the algorithmic complexity of the underlying numeric kernel. We present an alternate, auto-tuning performance prediction approach based on performance history tables dynamically built during the application run. This approach does not require that the programmer provides some specific information. We show that, thanks to the use of a carefully chosen hash-function, our approach quickly achieves accurate performance estimations automatically. Our approach even outperforms regular algorithmic performance models with several linear algebra numerical kernels.