A Unified Runtime System for Heterogeneous Multi-core Architectures

Authors:
Cédric Augonnet;Raymond Namyst
Affiliations:
INRIA Bordeaux --- LaBRI, University of Bordeaux,;INRIA Bordeaux --- LaBRI, University of Bordeaux,
Venue:
Euro-Par 2008 Workshops - Parallel Processing
Year:
2009

Citing 8
Cited 2

The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
MPI Microtask for programming the cell broadband engineTM processor

IBM Systems Journal
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Merge: a programming model for heterogeneous multi-core systems

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Accelerating computing with the cell broadband engine processor

Proceedings of the 5th conference on Computing frontiers
Extending the OpenMP tasking model to allow dependent tasks

IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism

Extending a Run-time Resource Management framework to support OpenCL and Heterogeneous Systems

Proceedings of Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms
Improving application behavior on heterogeneous manycore systems through kernel mapping

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Approaching the theoretical performance of heterogeneous multicore architectures, equipped with specialized accelerators, is a challenging issue. Unlike regular CPU s that can transparently access the whole global memory address range, accelerators usually embed local memory on which they perform all their computations using a specific instruction set. While many research efforts have been devoted to offloading parts of a program over such coprocessors, the real challenge is to find a programming model providing a unified view of all available computing units. In this paper, we present an original runtime system providing a high-level, unified execution model allowing seamless execution of tasks over the underlying heterogeneous hardware. The runtime is based on a hierarchical memory management facility and on a codelet scheduler. We demonstrate the efficiency of our solution with a LU decomposition for both homogeneous (3.8 speedup on 4 cores) and heterogeneous machines (95 % efficiency). We also show that a "granularity aware" scheduling can improve execution time by 35 %.