A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs

Authors:
Kittisak Sajjapongse;Xiang Wang;Michela Becchi
Affiliations:
University of Missouri, Columbia, MO, USA;University of Missouri, Columbia, MO, USA;University of Missouri, Columbia, MO, USA
Venue:
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Year:
2013

Citing 24
Cited 0

Coscheduling based on runtime identification of activity working sets

International Journal of Parallel Programming
Effective distributed scheduling of parallel workloads

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems

ACM Transactions on Computer Systems (TOCS)
Highly efficient gang scheduling implementation

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
A Performance Comparison of Coscheduling Strategies for Workstation Clusters

Cluster Computing
Dynamic Coscheduling on Workstation Clusters

IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
An Efficient and Scalable Coscheduling Technique for Large Symmetric Multiprocessor Clusters

JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
Balancing Risk and Reward in a Market-Based Task Service

HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
Coscheduling in Clusters: Is It a Viable Alternative?

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Performance Comparison of Coscheduling Algorithms for Non-Dedicated Clusters Through a Generic Framework

International Journal of High Performance Computing Applications
GViM: GPU-accelerated virtual machines

Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing
vCUDA: GPU accelerated high performance computing in virtual machines

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Contention aware execution: online contention detection and response

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
A GPGPU transparent virtualization component for high performance computing clouds

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework

Proceedings of the 20th international symposium on High performance distributed computing
Parallel job scheduling — a status report

JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Enabling CUDA acceleration within virtual machines using rCUDA

HIPC '11 Proceedings of the 2011 18th International Conference on High Performance Computing
A virtual memory based runtime to support multi-tenancy in clusters with GPUs

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Interference-driven resource management for GPU-based heterogeneous clusters

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Scheduling Concurrent Applications on a Cluster of CPU-GPU Nodes

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Preemption of a CUDA Kernel Function

SNPD '12 Proceedings of the 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing
Measuring interference between live datacenter applications

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
ValuePack: value-based scheduling framework for CPU-GPU clusters

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the last few years, thanks to their computational power, their progressively increasing programmability and their wide adoption in both the research community and in industry, GPUs have become part of HPC clusters (for example, the US Titan and Stampede and the Chinese Tianhe-1A supercomputers). As a result, widely used open-source cluster resource managers (e.g. SLURM and TORQUE) have recently been extended with GPU support capabilities. These systems, however, provide simple scheduling mechanisms that often result in resource underutilization and, thereby, in suboptimal performance. In this paper, we propose a runtime system that can be integrated with existing cluster resource managers to enable a more efficient use of heterogeneous clusters with GPUs. Differently from previous work, we focus on multi-process GPU applications including synchronization (for example, hybrid MPI-CUDA applications). We discuss the limitations and inefficiencies of existing scheduling and resource sharing schemes in the presence of synchronization. We show that preemption is an effective mechanism to allow efficient scheduling of hybrid MPI-CUDA applications. We validate our runtime on a variety of benchmark programs with different computation and communication patterns.