An automatic input-sensitive approach for heterogeneous task partitioning

Authors:
Klaus Kofler;Ivan Grasso;Biagio Cosenza;Thomas Fahringer
Affiliations:
University of Innsbruck, Innsbruck, Austria;University of Innsbruck, Innsbruck, Austria;University of Innsbruck, Innsbruck, Austria;University of Innsbruck, Innsbruck, Austria
Venue:
Proceedings of the 27th international ACM conference on International conference on supercomputing
Year:
2013

Citing 19
Cited 2

Introduction to Machine Learning (Adaptive Computation and Machine Learning)

Introduction to Machine Learning (Adaptive Computation and Machine Learning)
Predicting Unroll Factors Using Supervised Classification

Proceedings of the international symposium on Code generation and optimization
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Merge: a programming model for heterogeneous multi-core systems

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Rodinia: A benchmark suite for heterogeneous computing

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
The Scalable Heterogeneous Computing (SHOC) benchmark suite

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
MapCG: writing parallel program portable between CPU and GPU

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A static task partitioning approach for heterogeneous systems using OpenCL

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Hybrid OpenCL: Enhancing OpenCL for Distributed Processing

ISPA '11 Proceedings of the 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications
Where is the data? Why you cannot debate CPU vs. GPU performance without the answer

ISPASS '11 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software
Automatic OpenCL device characterization: guiding optimized kernel design

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
OpenCL as a unified programming model for heterogeneous CPU/GPU clusters

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Enabling task-level scheduling on heterogeneous platforms

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters

Proceedings of the 26th ACM international conference on Supercomputing
CUDASA: compute unified device and systems architecture

EG PGV'08 Proceedings of the 8th Eurographics conference on Parallel Graphics and Visualization
Distributed Computation and Large-Scale Visualization in Heterogeneous Compute Environments

ISPDC '12 Proceedings of the 2012 11th International Symposium on Parallel and Distributed Computing
LibWater: heterogeneous distributed computing made easy

Proceedings of the 27th international ACM conference on International conference on supercomputing

LibWater: heterogeneous distributed computing made easy

Proceedings of the 27th international ACM conference on International conference on supercomputing
An application-centric evaluation of OpenCL on multi-core CPUs

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Unleashing the full potential of heterogeneous systems, consisting of multi-core CPUs and GPUs, is a challenging task due to the difference in processing capabilities, memory availability, and communication latencies of different computational resources. In this paper we propose a novel approach that automatically optimizes task partitioning for different (input) problem sizes and different heterogeneous multi-core architectures. We use the Insieme source-to-source compiler to translate a single-device OpenCL program into a multi-device OpenCL program. The Insieme Runtime System then performs dynamic task partitioning based on an offline-generated prediction model. In order to derive the prediction model, we use a machine learning approach based on Artificial Neural Networks (ANN) that incorporates static program features as well as dynamic, input sensitive features. Principal component analysis have been used to further improve the task partitioning. Our approach has been evaluated over a suite of 23 programs and respectively achieves a performance improvement of 22% and 25% compared to an execution of the benchmarks on a single CPU and a single GPU which is equal to 87.5% of the optimal performance.