Dynamic thread assignment on heterogeneous multiprocessor architectures
Proceedings of the 3rd conference on Computing frontiers
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Scalable Parallel Programming with CUDA
Queue - GPU Computing
CUDA-Lite: Reducing GPU Programming Complexity
Languages and Compilers for Parallel Computing
Automated control of multiple virtualized resources
Proceedings of the 4th ACM European conference on Computer systems
GViM: GPU-accelerated virtual machines
Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing
Programming model for a heterogeneous x86 platform
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Automated control in cloud computing: challenges and opportunities
ACDC '09 Proceedings of the 1st workshop on Automated control for datacenters and clouds
A framework for efficient and scalable execution of domain-specific templates on GPUs
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
vCUDA: GPU accelerated high performance computing in virtual machines
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
An asymmetric distributed shared memory model for heterogeneous parallel systems
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Elastic Site: Using Clouds to Elastically Extend Site Resources
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
OpenMPC: Extended OpenMP Programming and Tuning for GPUs
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
A GPGPU transparent virtualization component for high performance computing clouds
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Achieving a single compute device image in OpenCL for multiple GPUs
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework
Proceedings of the 20th international symposium on High performance distributed computing
NVCR: A Transparent Checkpoint-Restart Library for NVIDIA CUDA
IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Interference-driven resource management for GPU-based heterogeneous clusters
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
VGRIS: virtualized GPU resource isolation and scheduling in cloud gaming
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
COSMIC: middleware for high performance and reliable multiprocessing on xeon phi coprocessors
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Multi-tenancy on GPGPU-based servers
Proceedings of the 7th international workshop on Virtualization technologies in distributed computing
Hi-index | 0.00 |
Graphics Processing Units (GPUs) are increasingly becoming part of HPC clusters. Nevertheless, cloud computing services and resource management frameworks targeting heterogeneous clusters including GPUs are still in their infancy. Further, GPU software stacks (e.g., CUDA driver and runtime) currently provide very limited support to concurrency. In this paper, we propose a runtime system that provides abstraction and sharing of GPUs, while allowing isolation of concurrent applications. A central component of our runtime is a memory manager that provides a virtual memory abstraction to the applications. Our runtime is flexible in terms of scheduling policies, and allows dynamic (as opposed to programmer-defined) binding of applications to GPUs. In addition, our framework supports dynamic load balancing, dynamic upgrade and downgrade of GPUs, and is resilient to their failures. Our runtime can be deployed in combination with VM-based cloud computing services to allow virtualization of heterogeneous clusters, or in combination with HPC cluster resource managers to form an integrated resource management infrastructure for heterogeneous clusters. Experiments conducted on a three-node cluster show that our GPU sharing scheme allows up to a 28% and a 50% performance improvement over serialized execution on short- and long-running jobs, respectively. Further, dynamic inter-node load balancing leads to an additional 18-20% performance benefit.