Maestro: data orchestration and tuning for OpenCL devices

Authors:
Kyle Spafford;Jeremy Meredith;Jeffrey Vetter
Affiliations:
Oak Ridge National Laboratory;Oak Ridge National Laboratory;Oak Ridge National Laboratory
Venue:
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Year:
2010

Citing 7
Cited 9

Sparse matrix solvers on the GPU: conjugate gradients and multigrid

ACM SIGGRAPH 2003 Papers
Efficient gather and scatter operations on graphics processors

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
GPU acceleration of cutoff pair potentials for molecular modeling applications

Proceedings of the 5th conference on Computing frontiers
Accuracy and performance of graphics processors: A Quantum Monte Carlo application case study

Parallel Computing
Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems

Proceedings of the 23rd international conference on Supercomputing
The Scalable Heterogeneous Computing (SHOC) benchmark suite

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Accelerating S3D: a GPGPU case study

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing

MDR: performance model driven runtime for heterogeneous parallel platforms

Proceedings of the international conference on Supercomputing
Enabling task-level scheduling on heterogeneous platforms

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
GA-GPU: extending a library-based global address spaceprogramming model for scalable heterogeneouscomputing systems

Proceedings of the 9th conference on Computing Frontiers
Parameterized micro-benchmarking: an auto-tuning approach for complex applications

Proceedings of the 9th conference on Computing Frontiers
Automatic generation of software pipelines for heterogeneous parallel systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Dandelion: a compiler and runtime for heterogeneous systems

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Data Parallel Implementation of Belief Propagation in Factor Graphs on Multi-core Platforms

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

As heterogeneous computing platforms become more prevalent, the programmer must account for complex memory hierarchies in addition to the difficulties of parallel programming. OpenCL is an open standard for parallel computing that helps alleviate this difficulty by providing a portable set of abstractions for device memory hierarchies. However, OpenCL requires that the programmer explicitly controls data transfer and device synchronization, two tedious and error-prone tasks. This paper introduces Maestro, an open source library for data orchestration on OpenCL devices. Maestro provides automatic data transfer, task decomposition across multiple devices, and autotuning of dynamic execution parameters for some types of problems.