Virtualization aware job schedulers for checkpoint-restart

Authors:
R. Badrinath;R. Krishnakumar;R. K. Palanivel Rajan
Affiliations:
Hewlett-Packard, USA;Hewlett-Packard, USA;Hewlett-Packard, USA
Venue:
ICPADS '07 Proceedings of the 13th International Conference on Parallel and Distributed Systems - Volume 02
Year:
2007

Citing 0
Cited 1

Energy-efficient and multifaceted resource management for profit-driven virtualized data centers

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Application checkpoint and restart has been a widely studied problem over the last several decades. Despite immense volume of theory and several research project level implementations, there is very little by way of working solutions for the case of parallel distributed applications (such as MPI programs on a cluster). We describe our experiences in enhancing a job scheduler to leverage mechanisms of a virtual machine environment to support checkpoint-restart. We also describe the basic coordinated checkpoint-restart framework that we implemented on which this solution is based.