An Operating System Architecture for Future Information Appliances
SEUS '08 Proceedings of the 6th IFIP WG 10.2 international workshop on Software Technologies for Embedded and Ubiquitous Systems
Tolerating hardware device failures in software
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Membrane: Operating system support for restartable file systems
ACM Transactions on Storage (TOS)
Membrane: operating system support for restartable file systems
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
CuriOS: improving reliability through operating system structure
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Decaf: moving device drivers to a modern language
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Tolerating malicious device drivers in Linux
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Testing closed-source binary device drivers with DDT
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
HotDep'10 Proceedings of the Sixth international conference on Hot topics in system dependability
vIOMMU: efficient IOMMU emulation
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
A comparative experimental study of software rejuvenation overhead
Performance Evaluation
Safe and automatic live update for operating systems
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Fine-grained fault tolerance using device checkpoints
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Optimizing decomposition of software architecture for local recovery
Software Quality Control
Reliability analysis reloaded: how will we survive?
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Studies have shown that device drivers and extensions contain 3-7 times more bugs than other operating system code and thus are more likely to fail. Therefore, we present a failure-resilient operating system design that can recover from dead drivers and other critical components--primarily through monitoring and replacing malfunctioning components on the fly--transparent to applications and without user intervention. This paper focuses on the post-mortem recovery procedure. We explain the working of our defect detection mechanism, the policy-driven recovery procedure, and post-restart reintegration of the components. Furthermore, we discuss the concrete steps taken to recover from network, block device, and character device driver failures. Finally, we evaluate our design using performance measurements, software fault-injection experiments, and an analysis of the reengineering effort.