Load balancing in a changing world: dealing with heterogeneity and performance variability

Authors:
Michael Boyer;Kevin Skadron;Shuai Che;Nuwan Jayasena
Affiliations:
University of Virginia;University of Virginia;AMD Research;AMD Research
Venue:
Proceedings of the ACM International Conference on Computing Frontiers
Year:
2013

Citing 17
Cited 1

Allocating Independent Subtasks on Parallel Processors

IEEE Transactions on Software Engineering
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing

IEEE Transactions on Parallel and Distributed Systems
Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers

IEEE Transactions on Parallel and Distributed Systems
Distributed texture memory in a multi-GPU environment

GH '06 Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Merge: a programming model for heterogeneous multi-core systems

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Harmony: an execution model and runtime for heterogeneous many core systems

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
A Compute Unified System Architecture for Graphics Clusters Incorporating Data Locality

IEEE Transactions on Visualization and Computer Graphics
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Rodinia: A benchmark suite for heterogeneous computing

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Power-Efficient Work Distribution Method for CPU-GPU Heterogeneous System

ISPA '10 Proceedings of the International Symposium on Parallel and Distributed Processing with Applications
Data-Aware Task Scheduling on Multi-accelerator Based Platforms

ICPADS '10 Proceedings of the 2010 IEEE 16th International Conference on Parallel and Distributed Systems
Achieving a single compute device image in OpenCL for multiple GPUs

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Automating GPU computing in MATLAB

Proceedings of the international conference on Supercomputing
Profiling Heterogeneous Multi-GPU Systems to Accelerate Cortically Inspired Learning Algorithms

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems

IEEE Micro
Enabling task-level scheduling on heterogeneous platforms

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units

ViperVM: a runtime system for parallel functional high-performance computing on heterogeneous architectures

Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fully utilizing the power of modern heterogeneous systems requires judiciously dividing work across all of the available computational devices. Existing approaches for partitioning work require offline training and generate fixed partitions that fail to respond to fluctuations in device performance that occur at run time. We present a novel dynamic approach to work partitioning that requires no offline training and responds automatically to performance variability to provide consistently good performance. Using six diverse OpenCL™ applications, we demonstrate the effectiveness of our approach in scenarios both with and without run-time performance variability, as well as in more extreme scenarios in which one device is non-functional.