A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures

Authors:
Eduard Ayguade;Rosa M. Badia;Daniel Cabrera;Alejandro Duran;Marc Gonzalez;Francisco Igual;Daniel Jimenez;Jesus Labarta;Xavier Martorell;Rafael Mayo;Josep M. Perez;Enrique S. Quintana-Ortí
Affiliations:
Universitat Politècnica de Catalunya (UPC), and Barcelona Supercomputing Center (BSC-CNS),;Barcelona Supercomputing Center (BSC-CNS),;Universitat Politècnica de Catalunya (UPC),;Universitat Politècnica de Catalunya (UPC), and Barcelona Supercomputing Center (BSC-CNS),;Universitat Politècnica de Catalunya (UPC), and Barcelona Supercomputing Center (BSC-CNS),;Universidad Jaume I, Castellon,;Universitat Politècnica de Catalunya (UPC),;Universitat Politècnica de Catalunya (UPC), and Barcelona Supercomputing Center (BSC-CNS),;Universitat Politècnica de Catalunya (UPC), and Barcelona Supercomputing Center (BSC-CNS),;Universidad Jaume I, Castellon,;Barcelona Supercomputing Center (BSC-CNS),;Universidad Jaume I, Castellon,
Venue:
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Year:
2009

Citing 8
Cited 16

CellSs: a programming model for the cell BE architecture

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Compilation for explicitly managed memory hierarchies

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Merge: a programming model for heterogeneous multi-core systems

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
Supporting OpenMP on cell

International Journal of Parallel Programming
OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A streaming machine description and programming model

SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Extending the OpenMP tasking model to allow dependent tasks

IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism

An Extension of the StarSs Programming Model for Platforms with Multiple GPUs

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Cost-aware function migration in heterogeneous systems

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
OpenMP extensions for heterogeneous architectures

IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Case studies in automatic GPGPU code generation with llc

Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Productive cluster programming with OmpSs

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Supporting OpenMP on a multi-cluster embedded MPSoC

Microprocessors & Microsystems
Optimization strategies in different CUDA architectures using llCoMP

Microprocessors & Microsystems
Analysis of task offloading for accelerators

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
One stone two birds: synchronization relaxation and redundancy removal in GPU-CPU translation

Proceedings of the 26th ACM international conference on Supercomputing
LIBKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Overlapping computations with communications and i/o explicitly using OpenMP based heterogeneous threading models

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
High-level support for pipeline parallelism on many-core architectures

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
On the instrumentation of OpenMP and ompss tasking constructs

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Improving the programmability of STHORM-based heterogeneous systems with offload-enabled OpenMP

Proceedings of the First International Workshop on Many-core Embedded Systems
OmpSs@Zynq all-programmable SoC ecosystem

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

OpenMP has evolved recently towards expressing unstructured parallelism, targeting the parallelization of a broader range of applications in the current multicore era. Homogeneous multicore architectures from major vendors have become mainstream, but with clear indications that a better performance/power ratio can be achieved using more specialized hardware (accelerators), such as SSE-based units or GPUs, clearly deviating from the easy-to-understand shared-memory homogeneous architectures. This paper investigates if OpenMP could still survive in this new scenario and proposes a possible way to extend the current specification to reasonably integrate heterogeneity while preserving simplicity and portability. The paper leverages on a previous proposal that extended tasking with dependencies. The runtime is in charge of data movement, tasks scheduling based on these data dependencies and the appropriate selection of the target accelerator depending on system configuration and resource availability.