The Design and Verification of the Rio File Cache
IEEE Transactions on Computers
An empirical study of operating systems errors
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
The Systematic Improvement of Fault Tolerance in the Rio File Cache
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Improving the reliability of commodity operating systems
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Emulation of Software Faults: A Field Data Study and a Practical Approach
IEEE Transactions on Software Engineering
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
seL4: formal verification of an OS kernel
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Otherworld: giving applications a chance to survive OS kernel crashes
Proceedings of the 5th European conference on Computer systems
Faults in linux: ten years later
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Phase-based reboot: Reusing operating system execution phases for cheap reboot-based recovery
DSN '11 Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems&Networks
CloudVal: A framework for validation of virtualization environment in cloud infrastructure
DSN '11 Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems&Networks
Experimental Analysis of Binary-Level Software Fault Injection in Complex Software
EDCC '12 Proceedings of the 2012 Ninth European Dependable Computing Conference
Hi-index | 0.00 |
Linux kernel oops is invoked when the kernel detects an erroneous state inside itself. It kills an offending process and allows Linux to continue its operation under a compromised reliability. We investigate how reliable Linux is after a kernel oops in this paper. To investigate the reliability after a kernel oops, we analyze the scope of error propagation through an experimental campaign of fault injection in Linux 2.6.38. The error propagation scope is process-local if an error is confined in the process context that activated it, while the scope is kernel-global if an error propagates to other processes' contexts or global data structures. If the scope is process-local, Linux can be reliable even after a kernel oops. Our findings are twofold. First, the error propagation scope is mostly process-local. Thus, Linux remains consistent after a kernel oops in most cases. Second, Linux stops its execution before accessing inconsistent states when kernel-global errors occur because synchronization primitives prevent the inconsistent states from being accessed by other processes.