Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems

  • Authors:
  • Usman Dastgeer;Johan Enmyren;Christoph W. Kessler

  • Affiliations:
  • Linköping University, Stokholm, Sweden;Linköping University, Stokholm, Sweden;Linköping University, Stokholm, Sweden

  • Venue:
  • Proceedings of the 4th International Workshop on Multicore Software Engineering
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

SkePU is a C++ template library that provides a simple and unified interface for specifying data-parallel computations with the help of skeletons on GPUs using CUDA and OpenCL. The interface is also general enough to support other architectures, and SkePU implements both a sequential CPU and a parallel OpenMP backend. It also supports multi-GPU systems. Currently available skeletons in SkePU include map, reduce, mapreduce, map-with-overlap, maparray, and scan. The performance of SkePU generated code is comparable to that of hand-written code, even for more complex applications such as ODE solving. In this paper, we discuss initial results from auto-tuning SkePU using an off-line, machine learning approach where we adapt skeletons to a given platform using training data. The prediction mechanism at execution time uses off-line pre-calculated estimates to construct an execution plan for any desired configuration with minimal overhead. The prediction mechanism accurately predicts execution time for repetitive executions and includes a mechanism to predict execution time for user functions of different complexity. The tuning framework covers selection between different backends as well as choosing optimal parameter values for the selected backend. We will discuss our approach and initial results obtained for different skeletons (map, mapreduce, reduce).