Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems

Authors:
Usman Dastgeer;Johan Enmyren;Christoph W. Kessler
Affiliations:
Linköping University, Stokholm, Sweden;Linköping University, Stokholm, Sweden;Linköping University, Stokholm, Sweden
Venue:
Proceedings of the 4th International Workshop on Multicore Software Engineering
Year:
2011

Citing 9
Cited 5

Algorithmic skeletons: structured management of parallel computation

Algorithmic skeletons: structured management of parallel computation
Patterns and skeletons for parallel and distributed computing

Patterns and skeletons for parallel and distributed computing
Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming

Parallel Computing
An Adaptive Algorithm Selection Framework for Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
CuPP - A framework for easy CUDA integration

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Multi-target C++ implementation of parallel skeletons

Proceedings of the 8th workshop on Parallel/High-Performance Object-Oriented Scientific Computing
An adaptive performance modeling tool for GPU architectures

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming

APLAS '09 Proceedings of the 7th Asian Symposium on Programming Languages and Systems
SkePU: a multi-backend skeleton programming library for multi-GPU systems

Proceedings of the fourth international workshop on High-level parallel programming and applications

Fourth international workshop on multicore software engineering (IWMSE 2011)

Proceedings of the 33rd International Conference on Software Engineering
Optimized composition of performance-aware parallel components

Concurrency and Computation: Practice & Experience
PARTANS: An autotuning framework for stencil computation on multi-GPU systems

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
KFusion: optimizing data flow without compromising modularity

Proceedings of the 12th annual international conference on Aspect-oriented software development
Self-Configuration and Self-Optimization Autonomic Skeletons using Events

Proceedings of Programming Models and Applications on Multicores and Manycores

Quantified Score

Hi-index	0.01

Visualization

Abstract

SkePU is a C++ template library that provides a simple and unified interface for specifying data-parallel computations with the help of skeletons on GPUs using CUDA and OpenCL. The interface is also general enough to support other architectures, and SkePU implements both a sequential CPU and a parallel OpenMP backend. It also supports multi-GPU systems. Currently available skeletons in SkePU include map, reduce, mapreduce, map-with-overlap, maparray, and scan. The performance of SkePU generated code is comparable to that of hand-written code, even for more complex applications such as ODE solving. In this paper, we discuss initial results from auto-tuning SkePU using an off-line, machine learning approach where we adapt skeletons to a given platform using training data. The prediction mechanism at execution time uses off-line pre-calculated estimates to construct an execution plan for any desired configuration with minimal overhead. The prediction mechanism accurately predicts execution time for repetitive executions and includes a mechanism to predict execution time for user functions of different complexity. The tuning framework covers selection between different backends as well as choosing optimal parameter values for the selected backend. We will discuss our approach and initial results obtained for different skeletons (map, mapreduce, reduce).