PARTANS: An autotuning framework for stencil computation on multi-GPU systems

Authors:
Thibaut Lutz;Christian Fensch;Murray Cole
Affiliations:
University of Edinburgh, Edinburgh, United Kingdom;University of Edinburgh, Edinburgh, United Kingdom;University of Edinburgh, Edinburgh, United Kingdom
Venue:
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Year:
2013

Citing 12
Cited 1

Stencils and problem partitionings: their influence on the performance of multiple processor systems

IEEE Transactions on Computers
Cactus Application: Performance Predictions in Grid Environments

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Patterns for parallel programming

Patterns for parallel programming
Exploring the multiple-GPU design space

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Achieving a single compute device image in OpenCL for multiple GPUs

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
A design pattern language for engineering (parallel) software: merging the PLPP and OPL projects

Proceedings of the 2010 Workshop on Parallel Programming Patterns
Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems

Proceedings of the 4th International Workshop on Multicore Software Engineering
Mint: realizing CUDA performance in 3D stencil methods with annotated C

Proceedings of the international conference on Supercomputing
PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Physis: an implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
PADS: A Pattern-Driven Stencil Compiler-Based Tool for Reuse of Optimizations on GPGPUs

ICPADS '11 Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems
Towards High-Level Programming of Multi-GPU Systems Using the SkelCL Library

IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum

Autotuning Wavefront Applications for Multicore Multi-GPU Hybrid Architectures

Proceedings of Programming Models and Applications on Multicores and Manycores

Quantified Score

Hi-index	0.00

Visualization

Abstract

GPGPUs are a powerful and energy-efficient solution for many problems. For higher performance or larger problems, it is necessary to distribute the problem across multiple GPUs, increasing the already high programming complexity. In this article, we focus on abstracting the complexity of multi-GPU programming for stencil computation. We show that the best strategy depends not only on the stencil operator, problem size, and GPU, but also on the PCI express layout. This adds nonuniform characteristics to a seemingly homogeneous setup, causing up to 23% performance loss. We address this issue with an autotuner that optimizes the distribution across multiple GPUs.