A virtual memory based runtime to support multi-tenancy in clusters with GPUs

Authors:
Michela Becchi;Kittisak Sajjapongse;Ian Graves;Adam Procter;Vignesh Ravi;Srimat Chakradhar
Affiliations:
University of Missouri, Columbia, MO, USA;University of Missouri, Columbia, MO, USA;University of Missouri, Columbia, MO, USA;University of Missouri, Columbia, MO, USA;Ohio State University, Columbus, OH, USA;NEC Laboratories America, Princeton, NJ, USA
Venue:
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Year:
2012

Citing 19
Cited 5

Dynamic thread assignment on heterogeneous multiprocessor architectures

Proceedings of the 3rd conference on Computing frontiers
Merge: a programming model for heterogeneous multi-core systems

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Scalable Parallel Programming with CUDA

Queue - GPU Computing
CUDA-Lite: Reducing GPU Programming Complexity

Languages and Compilers for Parallel Computing
Automated control of multiple virtualized resources

Proceedings of the 4th ACM European conference on Computer systems
GViM: GPU-accelerated virtual machines

Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing
Programming model for a heterogeneous x86 platform

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Automated control in cloud computing: challenges and opportunities

ACDC '09 Proceedings of the 1st workshop on Automated control for datacenters and clouds
A framework for efficient and scalable execution of domain-specific templates on GPUs

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
vCUDA: GPU accelerated high performance computing in virtual machines

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
An asymmetric distributed shared memory model for heterogeneous parallel systems

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Elastic Site: Using Clouds to Elastically Extend Site Resources

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
OpenMPC: Extended OpenMP Programming and Tuning for GPUs

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
A GPGPU transparent virtualization component for high performance computing clouds

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Achieving a single compute device image in OpenCL for multiple GPUs

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework

Proceedings of the 20th international symposium on High performance distributed computing
NVCR: A Transparent Checkpoint-Restart Library for NVIDIA CUDA

IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

Interference-driven resource management for GPU-based heterogeneous clusters

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
VGRIS: virtualized GPU resource isolation and scheduling in cloud gaming

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
COSMIC: middleware for high performance and reliable multiprocessing on xeon phi coprocessors

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Multi-tenancy on GPGPU-based servers

Proceedings of the 7th international workshop on Virtualization technologies in distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphics Processing Units (GPUs) are increasingly becoming part of HPC clusters. Nevertheless, cloud computing services and resource management frameworks targeting heterogeneous clusters including GPUs are still in their infancy. Further, GPU software stacks (e.g., CUDA driver and runtime) currently provide very limited support to concurrency. In this paper, we propose a runtime system that provides abstraction and sharing of GPUs, while allowing isolation of concurrent applications. A central component of our runtime is a memory manager that provides a virtual memory abstraction to the applications. Our runtime is flexible in terms of scheduling policies, and allows dynamic (as opposed to programmer-defined) binding of applications to GPUs. In addition, our framework supports dynamic load balancing, dynamic upgrade and downgrade of GPUs, and is resilient to their failures. Our runtime can be deployed in combination with VM-based cloud computing services to allow virtualization of heterogeneous clusters, or in combination with HPC cluster resource managers to form an integrated resource management infrastructure for heterogeneous clusters. Experiments conducted on a three-node cluster show that our GPU sharing scheme allows up to a 28% and a 50% performance improvement over serialized execution on short- and long-running jobs, respectively. Further, dynamic inter-node load balancing leads to an additional 18-20% performance benefit.