Allocating Independent Subtasks on Parallel Processors
IEEE Transactions on Software Engineering
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing
IEEE Transactions on Parallel and Distributed Systems
Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers
IEEE Transactions on Parallel and Distributed Systems
Distributed texture memory in a multi-GPU environment
GH '06 Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Harmony: an execution model and runtime for heterogeneous many core systems
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality
IEEE Transactions on Visualization and Computer Graphics
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Rodinia: A benchmark suite for heterogeneous computing
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Power-Efficient Work Distribution Method for CPU-GPU Heterogeneous System
ISPA '10 Proceedings of the International Symposium on Parallel and Distributed Processing with Applications
Data-Aware Task Scheduling on Multi-accelerator Based Platforms
ICPADS '10 Proceedings of the 2010 IEEE 16th International Conference on Parallel and Distributed Systems
Achieving a single compute device image in OpenCL for multiple GPUs
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Automating GPU computing in MATLAB
Proceedings of the international conference on Supercomputing
Profiling Heterogeneous Multi-GPU Systems to Accelerate Cortically Inspired Learning Algorithms
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Enabling task-level scheduling on heterogeneous platforms
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Hi-index | 0.00 |
Fully utilizing the power of modern heterogeneous systems requires judiciously dividing work across all of the available computational devices. Existing approaches for partitioning work require offline training and generate fixed partitions that fail to respond to fluctuations in device performance that occur at run time. We present a novel dynamic approach to work partitioning that requires no offline training and responds automatically to performance variability to provide consistently good performance. Using six diverse OpenCL™ applications, we demonstrate the effectiveness of our approach in scenarios both with and without run-time performance variability, as well as in more extreme scenarios in which one device is non-functional.