Concurrency control and recovery in database systems
Concurrency control and recovery in database systems
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
Hypervisor-based fault tolerance
ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Dealing with disaster: surviving misbehaved kernel extensions
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Free transactions with Rio Vista
Proceedings of the sixteenth ACM symposium on Operating systems principles
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Fault Tolerance in Concurrent Object-Oriented Software through Coordinated Error Recovery
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Improving the reliability of commodity operating systems
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Making a Case for Efficient Supercomputing
Queue - Power Management
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
CCured: type-safe retrofitting of legacy software
ACM Transactions on Programming Languages and Systems (TOPLAS)
SAFECode: enforcing alias analysis for weakly typed languages
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
An effective hybrid transactional memory system with strong isolation guarantees
Proceedings of the 34th annual international symposium on Computer architecture
Exploring failure transparency and the limits of generic recovery
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Microreboot — A technique for cheap recovery
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
The transaction concept: virtues and limitations (invited paper)
VLDB '81 Proceedings of the seventh international conference on Very Large Data Bases - Volume 7
TxLinux: using and managing hardware transactional memory in an operating system
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Improving file system reliability with I/O shepherding
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Secure virtual architecture: a safe execution environment for commodity operating systems
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
SafeDrive: safe and recoverable extensions using language-based techniques
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Surviving sensor network software faults
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
HotDep'10 Proceedings of the Sixth international conference on Hot topics in system dependability
Software fault isolation with API integrity and multi-principal modules
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Linux kernel vulnerabilities: state-of-the-art defenses and open problems
Proceedings of the Second Asia-Pacific Workshop on Systems
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Monitoring data structures using hardware transactional memory
RV'11 Proceedings of the Second international conference on Runtime verification
Enhanced operating system security through efficient and fine-grained address space randomization
Security'12 Proceedings of the 21st USENIX conference on Security symposium
Techniques for efficient in-memory checkpointing
Proceedings of the 9th Workshop on Hot Topics in Dependable Systems
Guardrail: a high fidelity approach to protecting hardware devices from buggy drivers
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
We describe a strategy for enabling existing commodity operating systems to recover from unexpected run-time errors in nearly any part of the kernel, including core kernel components. Our approach is dynamic and request-oriented; it isolates the effects of a fault to the requests that caused the fault rather than to static kernel components. This approach is based on a notion of "recovery domains," an organizing principle to enable rollback of state affected by a request in a multithreaded system with minimal impact on other requests or threads. We have applied this approach on v2.4.22 and v2.6.27 of the Linux kernel and it required 132 lines of changed or new code: the other changes are all performed by a simple instrumentation pass of a compiler. Our experiments show that the approach is able to recover from otherwise fatal faults with minimal collateral impact during a recovery event.