Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Dummynet: a simple approach to the evaluation of network protocols
ACM SIGCOMM Computer Communication Review
Precision timestamping of network packets
IMW '01 Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Isolating cause-effect chains from computer programs
Proceedings of the 10th ACM SIGSOFT symposium on Foundations of software engineering
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
A solver for the network testbed mapping problem
ACM SIGCOMM Computer Communication Review
Robust synchronization of software clocks across the internet
Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
ReVirt: enabling intrusion analysis through virtual-machine logging and replay
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
An integrated experimental environment for distributed systems and networks
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
The design and implementation of Zap: a system for migrating computing environments
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Rx: treating bugs as allergies---a safe method to survive software failures
Proceedings of the twentieth ACM symposium on Operating systems principles
Using model checking to find serious file system errors
ACM Transactions on Computer Systems (TOCS)
Debugging operating systems with time-traveling virtual machines
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Measuring CPU overhead for I/O processing in the Xen virtual machine monitor
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Flashback: a lightweight extension for rollback and deterministic replay for software debugging
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Live migration of virtual machines
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
VMM-independent graphics acceleration
Proceedings of the 3rd international conference on Virtual execution environments
To infinity and beyond: time-warped network emulation
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Triage: diagnosing production run failures at the user's site
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Execution replay of multiprocessor virtual machines
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Parallax: virtual disks for virtual machines
Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
Remus: high availability via asynchronous virtual machine replication
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
DieCast: testing distributed systems with an accurate scale model
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Bridging the gap between software and hardware techniques for I/O virtualization
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
The flexlab approach to realistic evaluation of networked systems
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Life, death, and the critical transition: finding liveness bugs in systems code
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Friday: global comprehension for distributed replay
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Proceedings of the 33rd International Conference on Software Engineering
Beyond disk imaging for preserving user state in network testbeds
CSET'12 Proceedings of the 5th USENIX conference on Cyber Security Experimentation and Test
HotSnap: a hot distributed snapshot system for virtual machine cluster
LISA'13 Proceedings of the 27th international conference on Large Installation System Administration
Hi-index | 0.00 |
Emulab is a testbed for networked and distributed systems experimentation. Two guiding principles of its design are realism and control of experimentation. There is an inherent tension between these goals, however, and in some aspects of the testbed's design, Emulab's implementers favored realism over control. Thus, Emulab provides wide-ranging control over an experiment's environment and initial conditions, but relatively little control over its execution--in particular, the ability to suspend, preempt, or replay the experiment. We have extended Emulab with a new means of control over experiment execution: the ability to cleanly checkpoint the execution of the set of nodes and networks that comprise an experiment. Conventional checkpoint mechanisms can easily degrade the fidelity of experiment results as a consequence of checkpoint downtimes, overheads of background state saving, and unintended distributed checkpoint synchronization effects. In this paper we demonstrate a checkpointing technique that is transparent with respect to the execution of the system under test, almost completely concealing the underlying checkpoint activity. Building on our checkpoint mechanism, we have implemented two powerful facilities for experiment execution control: the ability to preemptively swap-out experiments without losing their run-time state, and the ability to time-travel through the run of a system.