Hypervisor-based fault tolerance
ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
ReVirt: enabling intrusion analysis through virtual-machine logging and replay
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Remus: high availability via asynchronous virtual machine replication
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Decoupling dynamic program analysis from execution in virtual environments
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Hi-index | 0.00 |
As embedded systems take more important roles at many places, it is more important for them to be able to show the evidences of system failures. Providing such evidences makes it easier to investigate the root causes of the failures and to prove the responsible parties. This paper proposes simultaneous logging and replaying of a system that enables recording evidences of system failures. The proposed system employs two virtual machines, one for the primary execution and the other for the backup execution. The backup virtual machine maintains the past state of the primary virtual machine along with the log to make the backup the same state as the primary. When a system failure occurs on the primary virtual machine, the VMM saves the backup state and the log. The saved backup state and the log can be used as an evidence. By replaying the backup virtual machine from the saved state following the saved log, the execution path to the failure can be completely analyzed. We developed such a logging and replaying feature in a VMM. It can log and replay the execution of the Linux operating system. The experiment results show the overhead of the primary execution is only fractional.