Autotuning Wavefront Applications for Multicore Multi-GPU Hybrid Architectures

Authors:
Siddharth Mohanty;Murray Cole
Affiliations:
Institute for Computing Systems Architecture, University of Edinburgh, UK;Institute for Computing Systems Architecture, University of Edinburgh, UK
Venue:
Proceedings of Programming Models and Applications on Multicores and Manycores
Year:
2014

Citing 14
Cited 0

Tiling optimizations for 3D scientific computations

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Prediction and adaptation in Active Harmony

Cluster Computing
Generating Parallel Programs from the Wavefront Design Pattern

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
A view of the parallel computing landscape

Communications of the ACM - A View of Parallel Computing
A parallel wavefront algorithm for efficient biological sequence comparison

ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartII
Harnessing parallelism in multicore clusters with the All-Pairs, Wavefront, and Makeflow abstractions

Cluster Computing
PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Auto-generation and auto-tuning of 3D stencil codes on GPU clusters

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Structured Parallel Programming: Patterns for Efficient Computation

Structured Parallel Programming: Patterns for Efficient Computation
PARTANS: An autotuning framework for stencil computation on multi-GPU systems

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Autotuning Wavefront Abstractions for Heterogeneous Architectures

WAMCA '12 Proceedings of the 2012 Third Workshop on Applications for Multi-Core Architecture
Auto-tuning methodology to represent landform attributes on multicore and multi-GPU systems

Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Portable mapping of data parallel programs to OpenCL for heterogeneous systems

CGO '13 Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Manual tuning of applications for heterogeneous parallel systems is tedious and complex. Optimizations are often not portable, and the whole process must be repeated when moving to a new system, or sometimes even to a different problem size. Pattern-based programming models provide structure which can assist in the creation of autotuners for such problems. We present a machine learning based auto-tuning framework which partitions the work created by applications which follow the wavefront pattern across systems comprising multicore CPUs and multiple GPU accelerators. The use of a pattern facilitates training on synthetically generated instances. Exhaustive search space exploration on real applications indicates that correct setting of the tuning factors leads to a maximum of 20x speedup over an optimized sequential baseline, with an average of 7.8x. Our machine learned heuristics obtain 98% of this speed-up, averaged across range of applications and architectures.