Introduction to Machine Learning (Adaptive Computation and Machine Learning)
Introduction to Machine Learning (Adaptive Computation and Machine Learning)
Predicting Unroll Factors Using Supervised Classification
Proceedings of the international symposium on Code generation and optimization
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Rodinia: A benchmark suite for heterogeneous computing
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
The Scalable Heterogeneous Computing (SHOC) benchmark suite
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
MapCG: writing parallel program portable between CPU and GPU
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A static task partitioning approach for heterogeneous systems using OpenCL
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Hybrid OpenCL: Enhancing OpenCL for Distributed Processing
ISPA '11 Proceedings of the 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications
Where is the data? Why you cannot debate CPU vs. GPU performance without the answer
ISPASS '11 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software
Automatic OpenCL device characterization: guiding optimized kernel design
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
OpenCL as a unified programming model for heterogeneous CPU/GPU clusters
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Enabling task-level scheduling on heterogeneous platforms
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters
Proceedings of the 26th ACM international conference on Supercomputing
CUDASA: compute unified device and systems architecture
EG PGV'08 Proceedings of the 8th Eurographics conference on Parallel Graphics and Visualization
Distributed Computation and Large-Scale Visualization in Heterogeneous Compute Environments
ISPDC '12 Proceedings of the 2012 11th International Symposium on Parallel and Distributed Computing
LibWater: heterogeneous distributed computing made easy
Proceedings of the 27th international ACM conference on International conference on supercomputing
LibWater: heterogeneous distributed computing made easy
Proceedings of the 27th international ACM conference on International conference on supercomputing
An application-centric evaluation of OpenCL on multi-core CPUs
Parallel Computing
Hi-index | 0.00 |
Unleashing the full potential of heterogeneous systems, consisting of multi-core CPUs and GPUs, is a challenging task due to the difference in processing capabilities, memory availability, and communication latencies of different computational resources. In this paper we propose a novel approach that automatically optimizes task partitioning for different (input) problem sizes and different heterogeneous multi-core architectures. We use the Insieme source-to-source compiler to translate a single-device OpenCL program into a multi-device OpenCL program. The Insieme Runtime System then performs dynamic task partitioning based on an offline-generated prediction model. In order to derive the prediction model, we use a machine learning approach based on Artificial Neural Networks (ANN) that incorporates static program features as well as dynamic, input sensitive features. Principal component analysis have been used to further improve the task partitioning. Our approach has been evaluated over a suite of 23 programs and respectively achieves a performance improvement of 22% and 25% compared to an execution of the benchmarks on a single CPU and a single GPU which is equal to 87.5% of the optimal performance.