Interference-driven resource management for GPU-based heterogeneous clusters

Authors:
Rajat Phull;Cheng-Hong Li;Kunal Rao;Hari Cadambi;Srimat Chakradhar
Affiliations:
NEC Laboratories America, Inc., Princeton, NJ, USA;NEC Laboratories America, Inc., Princeton, NJ, USA;NEC Laboratories America, Inc., Princeton, NJ, USA;NEC Laboratories America, Inc., Princeton, NJ, USA;NEC Laboratories America, Inc., Princeton, NJ, USA
Venue:
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Year:
2012

Citing 19
Cited 4

Network flows: theory, algorithms, and applications

Network flows: theory, algorithms, and applications
The ANL/IBM SP Scheduling System

IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Improved Utilization and Responsiveness with Gang Scheduling

IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Performance Evaluation of Systems Using Nets

Proceedings of the Advanced Course on General Net Theory of Processes and Systems: Net Theory and Applications
An Integrated Approach to Parallel Scheduling Using Gang-Scheduling, Backfilling, and Migration

IEEE Transactions on Parallel and Distributed Systems
Timed Petri nets and preliminary performance evaluation

ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
Paired Gang Scheduling

IEEE Transactions on Parallel and Distributed Systems
Benchmarking GPUs to tune dense linear algebra

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Scheduling Concurrent Bag-of-Tasks Applications on Heterogeneous Platforms

IEEE Transactions on Computers
Best-effort semantic document search on GPUs

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Contention aware execution: online contention detection and response

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers

IEEE Micro
Multi-GPU volume rendering using MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework

Proceedings of the 20th international symposium on High performance distributed computing
The impact of memory subsystem resource sharing on datacenter applications

Proceedings of the 38th annual international symposium on Computer architecture
Romeo: a tool for analyzing time petri nets

CAV'05 Proceedings of the 17th international conference on Computer Aided Verification
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
An Energy-Efficient Heterogeneous System for Embedded Learning and Classification

IEEE Embedded Systems Letters
A virtual memory based runtime to support multi-tenancy in clusters with GPUs

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing

A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
VGRIS: virtualized GPU resource isolation and scheduling in cloud gaming

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
COSMIC: middleware for high performance and reliable multiprocessing on xeon phi coprocessors

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Multi-tenancy on GPGPU-based servers

Proceedings of the 7th international workshop on Virtualization technologies in distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

GPU-based clusters are increasingly being deployed in HPC environments to accelerate a variety of scientific applications. Despite their growing popularity, the GPU devices themselves are under-utilized even for many computationally-intensive jobs. This stems from the fact that the typical GPU usage model is one in which a host processor periodically offloads computationally intensive portions of an application to the coprocessor. Since some portions of code cannot be offloaded to the GPU (for example, code performing network communication in MPI applications), this usage model results in periods of time when the GPU is idle. GPUs could be time-shared across jobs to "fill" these idle periods, but unlike CPU resources such as the cache, the effects of sharing the GPU are not well understood. Specifically, two jobs that time-share a single GPU will experience resource contention and interfere with each other. The resulting slow-down could lead to missed job deadlines. Current cluster managers do not support GPU-sharing, but instead dedicate GPUs to a job for the job's lifetime. In this paper, we present a framework to predict and handle interference when two or more jobs time-share GPUs in HPC clusters. Our framework consists of an analysis model, and a dynamic interference detection and response mechanism to detect excessive interference and restart the interfering jobs on different nodes. We implement our framework in Torque, an open-source cluster manager, and using real workloads on an HPC cluster, show that interference-aware two-job colocation (although our method is applicable to colocating more than two jobs) improves GPU utilization by 25%, reduces a job's waiting time in the queue by 39% and improves job latencies by around 20%.