Information Processing Letters
Efficient algorithms for distributed snapshots and global virtual time approximation
Journal of Parallel and Distributed Computing - Special issue on parallel and discrete event simulation
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
CLIP: a checkpointing tool for message-passing parallel programs
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
A Case For Grid Computing On Virtual Machines
ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
The design and implementation of Zap: a system for migrating computing environments
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Failure-aware checkpointing in fine-grained cycle sharing systems
Proceedings of the 16th international symposium on High performance distributed computing
Transparent system-level migration of PGAS applications using Xen on InfiniBand
CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
Virtual playgrounds for worm behavior investigation
RAID'05 Proceedings of the 8th international conference on Recent Advances in Intrusion Detection
Proceedings of the 1st ACM workshop on Virtualized infrastructure systems and architectures
A resiliency model for high performance infrastructure based on logical encapsulation
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Hi-index | 0.00 |
The capture of global, consistent snapshots of a distributed computing session or system is essential to the system's reliability, manageability, and accountability. Despite the large body of work at the application, library, and operating system levels, we identify a void in the spectrum of distributed snapshot techniques: taking snapshots of the entire distributed runtime environment. Such capability has unique applicability in a number of application scenarios. In this paper, we realize such capability in the context of virtual networked environments. More specifically, by adapting and implementing a distributed snapshot algorithm, we enable the capture of causally consistent snapshots of virtual machines in a virtual networked environment. The snapshot-taking operations do not require any modification to the applications or operating systems running inside the virtual environment. Preliminary evaluation results indicate that our technique incurs acceptable overhead and small disruption to the normal operation of the virtual environment.