Dealing with disaster: surviving misbehaved kernel extensions
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
An empirical study of operating systems errors
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Improving the reliability of commodity operating systems
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Scalable statistical bug isolation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Mondrix: memory isolation for linux using mondriaan memory protection
Proceedings of the twentieth ACM symposium on Operating systems principles
Linux Device Drivers, 3rd Edition
Linux Device Drivers, 3rd Edition
Live migration of virtual machines
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Privtrans: automatically partitioning programs for privilege separation
SSYM'04 Proceedings of the 13th conference on USENIX Security Symposium - Volume 13
Failure Resilience for Device Drivers
DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
SafeDrive: safe and recoverable extensions using language-based techniques
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
XFI: software guards for system address spaces
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Bro: a system for detecting network intruders in real-time
SSYM'98 Proceedings of the 7th conference on USENIX Security Symposium - Volume 7
Secure web applications via automatic partitioning
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
TxLinux: using and managing hardware transactional memory in an operating system
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
The design and implementation of microdrivers
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Enforcing authorization policies using transactional memory introspection
Proceedings of the 15th ACM conference on Computer and communications security
xCalls: safe I/O in memory transactions
Proceedings of the 4th ACM European conference on Computer systems
Proceedings of the 4th ACM European conference on Computer systems
SoftBound: highly compatible and complete spatial memory safety for c
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Live migration of direct-access devices
ACM SIGOPS Operating Systems Review
Fast byte-granularity software fault isolation
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Tolerating hardware device failures in software
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Protecting Commodity Operating System Kernels from Vulnerable Device Drivers
ACSAC '09 Proceedings of the 2009 Annual Computer Security Applications Conference
Membrane: operating system support for restartable file systems
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Augmented smartphone applications through clone cloud execution
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
CuriOS: improving reliability through operating system structure
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Device driver safety through a reference validation mechanism
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Tolerating malicious device drivers in Linux
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Operating system implications of fast, cheap, non-volatile memory
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Software fault isolation with API integrity and multi-principal modules
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Understanding modern device drivers
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Detecting and recovering from in-core hardware faults through software anomaly treatment
Detecting and recovering from in-core hardware faults through software anomaly treatment
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
VirtuOS: an operating system with kernel virtualization
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Techniques for efficient in-memory checkpointing
Proceedings of the 9th Workshop on Hot Topics in Dependable Systems
Hi-index | 0.00 |
Recovering faults in drivers is difficult compared to other code because their state is spread across both memory and a device. Existing driver fault-tolerance mechanisms either restart the driver and discard its state, which can break applications, or require an extensive logging mechanism to replay requests and recreate driver state. Even logging may be insufficient, though, if the semantics of requests are ambiguous. In addition, these systems either require large subsystems that must be kept up-to-date as the kernel changes, or require substantial rewriting of drivers. We present a new driver fault-tolerance mechanism that provides fine-grained control over the code protected. Fine-Grained Fault Tolerance (FGFT) isolates driver code at the granularity of a single entry point. It executes driver code as a transaction, allowing roll back if the driver fails. We develop a novel checkpointing mechanism to save and restore device state using existing power management code. Unlike past systems, FGFT can be incrementally deployed in a single driver without the need for a large kernel subsystem, but at the cost of small modifications to the driver. In the evaluation, we show that FGFT can have almost zero runtime cost in many cases, and that checkpoint-based recovery can reduce the duration of a failure by 79% compared to restarting the driver. Finally, we show that applying FGFT to a driver requires little effort, and the majority of drivers in common classes already contain the power-management code needed for checkpoint/restore.