SkePU: a multi-backend skeleton programming library for multi-GPU systems

Authors:
Johan Enmyren;Christoph W. Kessler
Affiliations:
Linköping University, Linköping, Sweden;Linköping University, Linköping, Sweden
Venue:
Proceedings of the fourth international workshop on High-level parallel programming and applications
Year:
2010

Citing 13
Cited 12

Algorithmic skeletons: structured management of parallel computation

Algorithmic skeletons: structured management of parallel computation
Structured development of parallel programs

Structured development of parallel programs
More Effective C++: 35 New Ways to Improve Your Programs and Designs

More Effective C++: 35 New Ways to Improve Your Programs and Designs
A Skeleton Library

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Patterns and skeletons for parallel and distributed computing

Patterns and skeletons for parallel and distributed computing
Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming

Parallel Computing
Optimizing locality and scalability of embedded Runge--Kutta solvers using block-based pipelining

Journal of Parallel and Distributed Computing
QUAFF: efficient C++ design for parallel skeletons

Parallel Computing - Algorithmic skeletons
BlockLib: a skeleton library for cell broadband engine

Proceedings of the 1st international workshop on Multicore software engineering
CuPP - A framework for easy CUDA integration

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Multi-target C++ implementation of parallel skeletons

Proceedings of the 8th workshop on Parallel/High-Performance Object-Oriented Scientific Computing
A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming

APLAS '09 Proceedings of the 7th Asian Symposium on Programming Languages and Systems
Skandium: Multi-core Programming with Algorithmic Skeletons

PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing

Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems

Proceedings of the 4th International Workshop on Multicore Software Engineering
Introducing 'Bones': a parallelizing source-to-source compiler based on algorithmic skeletons

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Algorithmic skeletons for multi-core, multi-GPU systems and clusters

International Journal of High Performance Computing and Networking
Algorithmic species: A classification of affine loop nests for parallel programming

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
A multi-GPU programming library for real-time applications

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Using the SkelCL library for high-level GPU programming of 2d applications

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Programmability and performance portability aspects of heterogeneous multi-/manycore systems

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Skeletal based programming for dynamic programming on MultiGPU systems

The Journal of Supercomputing
Algorithmic skeleton framework for the orchestration of GPU computations

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Exploiting heterogeneous parallelism with the Heterogeneous Programming Library

Journal of Parallel and Distributed Computing
CU++: an object oriented framework for computational fluid dynamics applications using graphics processing units

The Journal of Supercomputing
APR: A Novel Parallel Repacking Algorithm for Efficient GPGPU Parallel Code Transformation

Proceedings of Workshop on General Purpose Processing Using GPUs

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present SkePU, a C++ template library which provides a simple and unified interface for specifying data-parallel computations with the help of skeletons on GPUs using CUDA and OpenCL. The interface is also general enough to support other architectures, and SkePU implements both a sequential CPU and a parallel OpenMP backend. It also supports multi-GPU systems. Copying data between the host and the GPU device memory can be a performance bottleneck. A key technique in SkePU is the implementation of lazy memory copying in the container type used to represent skeleton operands, which allows to avoid unnecessary memory transfers. We evaluate SkePU with small benchmarks and a larger application, a Runge-Kutta ODE solver. The results show that a skeleton approach to GPU programming is viable, especially when the computation burden is large compared to memory I/O (the lazy memory copying can help to achieve this). It also shows that utilizing several GPUs have a potential for performance gains. We see that SkePU offers good performance with a more complex and realistic task such as ODE solving, with up to 10 times faster run times when using SkePU with a GPU backend compared to a sequential solver running on a fast CPU.