Failure Resilience for Device Drivers

  • Authors:
  • Jorrit N. Herder;Herbert Bos;Ben Gras;Philip Homburg;Andrew S. Tanenbaum

  • Affiliations:
  • Vrije Universiteit, The Netherlands;Vrije Universiteit, The Netherlands;Vrije Universiteit, The Netherlands;Vrije Universiteit, The Netherlands;Vrije Universiteit, The Netherlands

  • Venue:
  • DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Studies have shown that device drivers and extensions contain 3-7 times more bugs than other operating system code and thus are more likely to fail. Therefore, we present a failure-resilient operating system design that can recover from dead drivers and other critical components--primarily through monitoring and replacing malfunctioning components on the fly--transparent to applications and without user intervention. This paper focuses on the post-mortem recovery procedure. We explain the working of our defect detection mechanism, the policy-driven recovery procedure, and post-restart reintegration of the components. Furthermore, we discuss the concrete steps taken to recover from network, block device, and character device driver failures. Finally, we evaluate our design using performance measurements, software fault-injection experiments, and an analysis of the reengineering effort.