A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures

  • Authors:
  • Eduard Ayguade;Rosa M. Badia;Daniel Cabrera;Alejandro Duran;Marc Gonzalez;Francisco Igual;Daniel Jimenez;Jesus Labarta;Xavier Martorell;Rafael Mayo;Josep M. Perez;Enrique S. Quintana-Ortí

  • Affiliations:
  • Universitat Politècnica de Catalunya (UPC), and Barcelona Supercomputing Center (BSC-CNS),;Barcelona Supercomputing Center (BSC-CNS),;Universitat Politècnica de Catalunya (UPC),;Universitat Politècnica de Catalunya (UPC), and Barcelona Supercomputing Center (BSC-CNS),;Universitat Politècnica de Catalunya (UPC), and Barcelona Supercomputing Center (BSC-CNS),;Universidad Jaume I, Castellon,;Universitat Politècnica de Catalunya (UPC),;Universitat Politècnica de Catalunya (UPC), and Barcelona Supercomputing Center (BSC-CNS),;Universitat Politècnica de Catalunya (UPC), and Barcelona Supercomputing Center (BSC-CNS),;Universidad Jaume I, Castellon,;Barcelona Supercomputing Center (BSC-CNS),;Universidad Jaume I, Castellon,

  • Venue:
  • IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

OpenMP has evolved recently towards expressing unstructured parallelism, targeting the parallelization of a broader range of applications in the current multicore era. Homogeneous multicore architectures from major vendors have become mainstream, but with clear indications that a better performance/power ratio can be achieved using more specialized hardware (accelerators), such as SSE-based units or GPUs, clearly deviating from the easy-to-understand shared-memory homogeneous architectures. This paper investigates if OpenMP could still survive in this new scenario and proposes a possible way to extend the current specification to reasonably integrate heterogeneity while preserving simplicity and portability. The paper leverages on a previous proposal that extended tasking with dependencies. The runtime is in charge of data movement, tasks scheduling based on these data dependencies and the appropriate selection of the target accelerator depending on system configuration and resource availability.