Sparse matrix solvers on the GPU: conjugate gradients and multigrid
ACM SIGGRAPH 2003 Papers
Efficient gather and scatter operations on graphics processors
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
GPU acceleration of cutoff pair potentials for molecular modeling applications
Proceedings of the 5th conference on Computing frontiers
Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems
Proceedings of the 23rd international conference on Supercomputing
The Scalable Heterogeneous Computing (SHOC) benchmark suite
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Accelerating S3D: a GPGPU case study
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
MDR: performance model driven runtime for heterogeneous parallel platforms
Proceedings of the international conference on Supercomputing
Enabling task-level scheduling on heterogeneous platforms
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Proceedings of the 9th conference on Computing Frontiers
Parameterized micro-benchmarking: an auto-tuning approach for complex applications
Proceedings of the 9th conference on Computing Frontiers
Automatic generation of software pipelines for heterogeneous parallel systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Dandelion: a compiler and runtime for heterogeneous systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Data Parallel Implementation of Belief Propagation in Factor Graphs on Multi-core Platforms
International Journal of Parallel Programming
Hi-index | 0.00 |
As heterogeneous computing platforms become more prevalent, the programmer must account for complex memory hierarchies in addition to the difficulties of parallel programming. OpenCL is an open standard for parallel computing that helps alleviate this difficulty by providing a portable set of abstractions for device memory hierarchies. However, OpenCL requires that the programmer explicitly controls data transfer and device synchronization, two tedious and error-prone tasks. This paper introduces Maestro, an open source library for data orchestration on OpenCL devices. Maestro provides automatic data transfer, task decomposition across multiple devices, and autotuning of dynamic execution parameters for some types of problems.