Algorithmic skeletons: structured management of parallel computation
Algorithmic skeletons: structured management of parallel computation
CHARM++: a portable concurrent object oriented system based on C++
OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
The C++ Programming Language
C++ Standard Template Library
Intel threading building blocks
Intel threading building blocks
STAPL: an adaptive, generic parallel C++ library
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
SkePU: a multi-backend skeleton programming library for multi-GPU systems
Proceedings of the fourth international workshop on High-level parallel programming and applications
Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems
Proceedings of the 4th International Workshop on Multicore Software Engineering
Multi-Target vectorization with MTPS c++ generic library
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Hi-index | 0.00 |
This paper presents the design of an efficient multi-target (CPU+GPU) implementation for the Parallel_for skeleton. Emerging massively parallel architectures promise very high performances for a low cost. However, these architectures change faster than ever. Thus, optimization of codes becomes a very complex and time consumming task. We have identified the data storage as the main difference between the CPU and the GPU implementation of a code. We introduce an abstract data layout in order to adapt the data storage. Based on this layout, the utilization of Parallel_for skeleton allows to compile and execute the same program both on CPU and on GPU. Once compiled, the program runs close to the hardware limits.