Suspending, migrating and resuming HPC virtual clusters

Authors:
Paolo Anedda;Simone Leo;Simone Manca;Massimo Gaggero;Gianluigi Zanetti
Affiliations:
Distributed Computing Group, CRS4, Edificio 1, Polaris, Pula, Italy;Distributed Computing Group, CRS4, Edificio 1, Polaris, Pula, Italy;Distributed Computing Group, CRS4, Edificio 1, Polaris, Pula, Italy;Distributed Computing Group, CRS4, Edificio 1, Polaris, Pula, Italy;Distributed Computing Group, CRS4, Edificio 1, Polaris, Pula, Italy
Venue:
Future Generation Computer Systems
Year:
2010

Citing 20
Cited 6

Dynamic Virtual Clusters in a Grid Site Manager

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Virtualization for high-performance computing

ACM SIGOPS Operating Systems Review
Maestro-VC: A Paravirtualized Execution Environment for Secure On-Demand Cluster Computing

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Virtual Clusters for Grid Communities

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Live migration of virtual machines

NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Virtual Clusters on the Fly - Fast, Scalable, and Flexible Installation

CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Drowning in data: digital library architecture to support scientific use of embedded sensor networks

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Managing large networks of virtual machines

LISA '06 Proceedings of the 20th conference on Large Installation System Administration
Towards efficient data distribution on computational desktop grids with BitTorrent

Future Generation Computer Systems
Xen and the Art of Cluster Scheduling

VTDC '06 Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing
Bioimage informatics

Bioinformatics
SnowFlock: rapid virtual machine cloning for cloud computing

Proceedings of the 4th ACM European conference on Computer systems
Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility

Future Generation Computer Systems
Resource Leasing and the Art of Suspending Virtual Machines

HPCC '09 Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications
Virtual Infrastructure Management in Private and Hybrid Clouds

IEEE Internet Computing
System-level virtualization research at Oak Ridge National Laboratory

Future Generation Computer Systems
Using Virtual Clusters to Decouple Computation and Data Management in High Throughput Analysis Applications

PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Virtual workspaces in the grid

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing

SLA enabled CARE resource broker

Future Generation Computer Systems
Hybrid Computing-Where HPC meets grid and Cloud Computing

Future Generation Computer Systems
Evaluation of delta compression techniques for efficient live migration of large virtual machines

Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Adaptive resource provisioning for read intensive multi-tier applications in the cloud

Future Generation Computer Systems
Energy-efficient and multifaceted resource management for profit-driven virtualized data centers

Future Generation Computer Systems
DISCOVERY, beyond the clouds

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

A systematic study of issues related to suspending, migrating and resuming virtual clusters for data-driven HPC applications is presented. The interest is focused on nontrivial virtual clusters, that is where the running computation is expected to be coordinated and strongly coupled. It is shown that this requires that all cluster level operations, such as start and save, should be performed as synchronously as possible on all nodes, introducing the need of barriers at the virtual cluster computing meta-level. Once a synchronization mechanism is provided, and appropriate transport strategies have been setup, it is possible to suspend, migrate and resume whole virtual clusters composed of ''heavy'' (4 GB RAM, 6 GB disk images) virtual machines in times of the order of few minutes without disrupting parallel computation-albeit of the MapReduce type-running inside them. The approach is intrinsically parallel, and should scale without problems to larger size virtual clusters.