Building a Self-Healing Operating System

Authors:
Francis M. David;Roy H. Campbell
Affiliations:
University of Illinois at Urbana-Champaign, USA;University of Illinois at Urbana-Champaign, USA
Venue:
DASC '07 Proceedings of the Third IEEE International Symposium on Dependable, Autonomic and Secure Computing
Year:
2007

Citing 0
Cited 3

Building a self-healing embedded system in a multi-OS environment

Proceedings of the 2009 ACM symposium on Applied Computing
Fault injection framework for system resilience evaluation: fake faults for finding future failures

Proceedings of the 2009 workshop on Resiliency in high performance
CuriOS: improving reliability through operating system structure

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

User applications and data in volatile memory are usu- ally lost when an operating system crashes because of er- rors caused by either hardware or software faults. This is because most operating systems are designed to stop working when some internal errors are detected despite the possibility that user data and applications might still be intact and recoverable. Techniques like exception han- dling, code reloading, operating system component isola- tion, micro-rebooting, automatic system service restarts, watchdog timer based recovery and transactional compo- nents can be applied to attempt self-healing of an operating system from a wide variety of errors. Fault injection exper- iments show that these techniques can be used to continue running user applications after transparently recovering the operating system in a large percentage of cases. In cases where transparent recovery is not possible, individual pro- cess recovery can be attempted as a last resort.