Is Linux kernel oops useful or not?

Authors:
Takeshi Yoshimura;Hiroshi Yamada;Kenji Kono
Affiliations:
Keio University;Keio University, CREST, JST;Keio University, CREST, JST
Venue:
HotDep'12 Proceedings of the Eighth USENIX conference on Hot Topics in System Dependability
Year:
2012

Citing 12
Cited 0

The Design and Verification of the Rio File Cache

IEEE Transactions on Computers
An empirical study of operating systems errors

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
The Systematic Improvement of Fault Tolerance in the Rio File Cache

FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Improving the reliability of commodity operating systems

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Emulation of Software Faults: A Field Data Study and a Practical Approach

IEEE Transactions on Software Engineering
Recovering device drivers

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
seL4: formal verification of an OS kernel

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Otherworld: giving applications a chance to survive OS kernel crashes

Proceedings of the 5th European conference on Computer systems
Faults in linux: ten years later

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Phase-based reboot: Reusing operating system execution phases for cheap reboot-based recovery

DSN '11 Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems&Networks
CloudVal: A framework for validation of virtualization environment in cloud infrastructure

DSN '11 Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems&Networks
Experimental Analysis of Binary-Level Software Fault Injection in Complex Software

EDCC '12 Proceedings of the 2012 Ninth European Dependable Computing Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Linux kernel oops is invoked when the kernel detects an erroneous state inside itself. It kills an offending process and allows Linux to continue its operation under a compromised reliability. We investigate how reliable Linux is after a kernel oops in this paper. To investigate the reliability after a kernel oops, we analyze the scope of error propagation through an experimental campaign of fault injection in Linux 2.6.38. The error propagation scope is process-local if an error is confined in the process context that activated it, while the scope is kernel-global if an error propagates to other processes' contexts or global data structures. If the scope is process-local, Linux can be reliable even after a kernel oops. Our findings are twofold. First, the error propagation scope is mostly process-local. Thus, Linux remains consistent after a kernel oops in most cases. Second, Linux stops its execution before accessing inconsistent states when kernel-global errors occur because synchronization primitives prevent the inconsistent states from being accessed by other processes.