An empirical study of operating systems errors
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
The distribution of faults in a large industrial software system
ISSTA '02 Proceedings of the 2002 ACM SIGSOFT international symposium on Software testing and analysis
A Theory of Programming Language Semantics
A Theory of Programming Language Semantics
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A Simple Way to Estimate the Cost of Downtime
LISA '02 Proceedings of the 16th USENIX conference on System administration
Operating Systems Design and Implementation (3rd Edition)
Operating Systems Design and Implementation (3rd Edition)
SAFECode: enforcing alias analysis for weakly typed languages
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
ACM Transactions on Computer Systems (TOCS)
Failure Resilience for Device Drivers
DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
SafeDrive: safe and recoverable extensions using language-based techniques
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Recovery domains: an organizing principle for recoverable operating systems
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
seL4: formal verification of an OS kernel
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
PAriCheck: an efficient pointer arithmetic checker for C programs
ASIACCS '10 Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security
Otherworld: giving applications a chance to survive OS kernel crashes
Proceedings of the 5th European conference on Computer systems
Membrane: operating system support for restartable file systems
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
CuriOS: improving reliability through operating system structure
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Design principles for contactile computing
Proceedings of the South African Institute of Computer Scientists and Information Technologists Conference on Knowledge, Innovation and Leadership in a Diverse, Multidisciplinary Environment
Enhanced operating system security through efficient and fine-grained address space randomization
Security'12 Proceedings of the 21st USENIX conference on Security symposium
Techniques for efficient in-memory checkpointing
Proceedings of the 9th Workshop on Hot Topics in Dependable Systems
When slower is faster: on heterogeneous multicores for reliable systems
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Hi-index | 0.00 |
We present an in-depth analysis of the crash-recovery problem and propose a novel approach to recover from otherwise fatal operating system (OS) crashes. We show how an unconventional, but careful, OS design, aided by automatic compiler-based code instrumentation, offers a practical solution towards the survivability of the entire system. Current results are encouraging and show that our approach is able to recover even the most critical OS subsystems without exposing the failure to user applications or hampering the scalability of the system.