Coscheduling based on runtime identification of activity working sets
International Journal of Parallel Programming
Effective distributed scheduling of parallel workloads
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems
ACM Transactions on Computer Systems (TOCS)
Highly efficient gang scheduling implementation
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Dynamic Coscheduling on Workstation Clusters
IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
An Efficient and Scalable Coscheduling Technique for Large Symmetric Multiprocessor Clusters
JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
Balancing Risk and Reward in a Market-Based Task Service
HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
Coscheduling in Clusters: Is It a Viable Alternative?
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
International Journal of High Performance Computing Applications
GViM: GPU-accelerated virtual machines
Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing
vCUDA: GPU accelerated high performance computing in virtual machines
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Contention aware execution: online contention detection and response
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
A GPGPU transparent virtualization component for high performance computing clouds
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework
Proceedings of the 20th international symposium on High performance distributed computing
Parallel job scheduling — a status report
JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Enabling CUDA acceleration within virtual machines using rCUDA
HIPC '11 Proceedings of the 2011 18th International Conference on High Performance Computing
A virtual memory based runtime to support multi-tenancy in clusters with GPUs
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Interference-driven resource management for GPU-based heterogeneous clusters
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Scheduling Concurrent Applications on a Cluster of CPU-GPU Nodes
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Preemption of a CUDA Kernel Function
SNPD '12 Proceedings of the 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing
Measuring interference between live datacenter applications
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
ValuePack: value-based scheduling framework for CPU-GPU clusters
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
In the last few years, thanks to their computational power, their progressively increasing programmability and their wide adoption in both the research community and in industry, GPUs have become part of HPC clusters (for example, the US Titan and Stampede and the Chinese Tianhe-1A supercomputers). As a result, widely used open-source cluster resource managers (e.g. SLURM and TORQUE) have recently been extended with GPU support capabilities. These systems, however, provide simple scheduling mechanisms that often result in resource underutilization and, thereby, in suboptimal performance. In this paper, we propose a runtime system that can be integrated with existing cluster resource managers to enable a more efficient use of heterogeneous clusters with GPUs. Differently from previous work, we focus on multi-process GPU applications including synchronization (for example, hybrid MPI-CUDA applications). We discuss the limitations and inefficiencies of existing scheduling and resource sharing schemes in the presence of synchronization. We show that preemption is an effective mechanism to allow efficient scheduling of hybrid MPI-CUDA applications. We validate our runtime on a variety of benchmark programs with different computation and communication patterns.