Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL

Authors:
Roger Ferrer;Judit Planas;Pieter Bellens;Alejandro Duran;Marc Gonzalez;Xavier Martorell;Rosa M. Badia;Eduard Ayguade;Jesus Labarta
Affiliations:
Barcelona Supercomputing Center, Barcelona, Spain;Barcelona Supercomputing Center, Barcelona, Spain;Barcelona Supercomputing Center, Barcelona, Spain;Barcelona Supercomputing Center, Barcelona, Spain;Barcelona Supercomputing Center, Barcelona, Spain and Departament d'Arquitectura de Computadors, Univ. Politècnica de Catalunya, Barcelona, Spain;Barcelona Supercomputing Center, Barcelona, Spain and Departament d'Arquitectura de Computadors, Univ. Politècnica de Catalunya, Barcelona, Spain;Barcelona Supercomputing Center, Barcelona, Spain and Artificial Intelligence Research Institute, Spanish National Research Council, Spain;Barcelona Supercomputing Center, Barcelona, Spain and Departament d'Arquitectura de Computadors, Univ. Politècnica de Catalunya, Barcelona, Spain;Barcelona Supercomputing Center, Barcelona, Spain and Departament d'Arquitectura de Computadors, Univ. Politècnica de Catalunya, Barcelona, Spain
Venue:
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Year:
2010

Citing 10
Cited 5

Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture

IBM Systems Journal
Compilation for explicitly managed memory hierarchies

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
CellSs: making it easier to program the cell broadband engine processor

IBM Journal of Research and Development
Merge: a programming model for heterogeneous multi-core systems

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
A Proposal for Task Parallelism in OpenMP

IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Supporting OpenMP on cell

International Journal of Parallel Programming
CUDA-Lite: Reducing GPU Programming Complexity

Languages and Compilers for Parallel Computing
A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures

IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Offload – automating code migration to heterogeneous multicore systems

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers

Productive cluster programming with OmpSs

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Accelerating code on multi-cores with fastflow

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
LARA: an aspect-oriented programming language for embedded systems

Proceedings of the 11th annual international conference on Aspect-oriented Software Development
On the instrumentation of OpenMP and ompss tasking constructs

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
An application-centric evaluation of OpenCL on multi-core CPUs

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present OMPSs, a programming model based on OpenMP and StarSs, that can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on three different architectures, SMP, Cell/B.E. and GPUs, showing the wide usefulness of the approach. The evaluation is done with four different benchmarks, Matrix Multiply, BlackScholes, Perlin Noise, and Julia Set. We compare the results obtained with the execution of the same benchmarks written in OpenCL, in the same architectures. The results show that OMPSs greatly outperforms the OpenCL environment. It is more flexible to exploit multiple accelerators. And due to the simplicity of the annotations, it increases programmer's productivity.