Multi-target C++ implementation of parallel skeletons

Authors:
Wilfried Kirschenmann;Laurent Plagne;Stephane Vialle
Affiliations:
EDF R&D & AlGorille INRIA, Clamart, France;EDF R&D, Clamart, France;SUPELEC - IMS group & AlGorille INRIA project team, Metz Cedex, France
Venue:
Proceedings of the 8th workshop on Parallel/High-Performance Object-Oriented Scientific Computing
Year:
2009

Citing 7
Cited 3

Algorithmic skeletons: structured management of parallel computation

Algorithmic skeletons: structured management of parallel computation
CHARM++: a portable concurrent object oriented system based on C++

OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
The C++ Programming Language

The C++ Programming Language
C++ Standard Template Library

C++ Standard Template Library
Intel threading building blocks

Intel threading building blocks
STAPL: an adaptive, generic parallel C++ library

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing

SkePU: a multi-backend skeleton programming library for multi-GPU systems

Proceedings of the fourth international workshop on High-level parallel programming and applications
Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems

Proceedings of the 4th International Workshop on Multicore Software Engineering
Multi-Target vectorization with MTPS c++ generic library

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the design of an efficient multi-target (CPU+GPU) implementation for the Parallel_for skeleton. Emerging massively parallel architectures promise very high performances for a low cost. However, these architectures change faster than ever. Thus, optimization of codes becomes a very complex and time consumming task. We have identified the data storage as the main difference between the CPU and the GPU implementation of a code. We introduce an abstract data layout in order to adapt the data storage. Based on this layout, the utilization of Parallel_for skeleton allows to compile and execute the same program both on CPU and on GPU. Once compiled, the program runs close to the hardware limits.