Shadowfax: scaling in heterogeneous cluster systems via GPGPU assemblies

Authors:
Alexander M. Merritt;Vishakha Gupta;Abhishek Verma;Ada Gavrilovska;Karsten Schwan
Affiliations:
Georgia Institute of Technology, Atlanta, GA, USA;Georgia Institute of Technology, Atlanta, GA, USA;Georgia Institute of Technology, Atlanta, GA, USA;Georgia Institute of Technology, Atlanta, GA, USA;Georgia Institute of Technology, Atlanta, GA, USA
Venue:
Proceedings of the 5th international workshop on Virtualization technologies in distributed computing
Year:
2011

Citing 9
Cited 2

Dynamic Virtual Clusters in a Grid Site Manager

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Towards virtual networks for virtual machine grid computing

VM'04 Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
GViM: GPU-accelerated virtual machines

Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing
vCUDA: GPU accelerated high performance computing in virtual machines

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Minimal-overhead virtualization of a large scale supercomputer

Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
A framework for dynamically instrumenting GPU compute applications within GPU Ocelot

Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Pegasus: coordinated scheduling for virtualized accelerator-based systems

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference

VGRIS: virtualized GPU resource isolation and scheduling in cloud gaming

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Multi-tenancy on GPGPU-based servers

Proceedings of the 7th international workshop on Virtualization technologies in distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Systems with specialized processors such as those used for accel- erating computations (like NVIDIA's graphics processors or IBM's Cell) have proven their utility in terms of higher performance and lower power consumption. They have also been shown to outperform general purpose processors in case of graphics intensive or high performance applications and for enterprise applications like modern financial codes or web hosts that require scalable image processing. These facts are causing tremendous growth in accelerator-based platforms in the high performance domain with systems like Keeneland, supercomputers like Tianhe-1, RoadRunner and even in data center systems like Amazon's EC2. The physical hardware in these systems, once purchased and assembled, is not reconfigurable and is expensive to modify or upgrade. This can eventually limit applications' performance and scalability unless they are rewritten to match specific versions of hardware and compositions of components, both for single nodes and for clusters of machines. To address this problem and to support increased flexibility in usage models for CUDA-based GPGPU applications, our research proposes GPGPU assemblies, where each assembly combines a desired number of CPUs and CUDA-supported GPGPUs to form a 'virtual execution platform' for an application. System-level software, then, creates and manages assemblies, including mapping them seamlessly to the actual cluster- and node- level hardware resources present in the system. Experimental evaluations of the initial implementation of GPGPU assemblies demonstrates their feasibility and advantages derived from their use.