A case for redundant arrays of inexpensive disks (RAID)
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Hypervisor-based fault tolerance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
DIVA: a reliable substrate for deep submicron microarchitecture design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Transient fault detection via simultaneous multithreading
Proceedings of the 27th annual international symposium on Computer architecture
Dependability of COTS Microkernel-Based Systems
IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
The sawmill framework for virtual memory diversity
ACSAC '01 Proceedings of the 6th Australasian conference on Computer systems architecture
SOSP '81 Proceedings of the eighth ACM symposium on Operating systems principles
Y-Branches: When You Come to a Fork in the Road, Take It
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
SWIFT: Software Implemented Fault Tolerance
Proceedings of the international symposium on Code generation and optimization
NonStop® Advanced Architecture
DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
The effects of energy management on reliability in real-time embedded systems
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Secure bootstrap is not enough: shoring up the trusted computing base
Proceedings of the 11th workshop on ACM SIGOPS European workshop
Reducing TCB complexity for security-sensitive applications: three case studies
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection
Proceedings of the International Symposium on Code Generation and Optimization
Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance
DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Kendo: efficient deterministic multithreading in software
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
DMTCP: Transparent checkpointing for cluster computations and the desktop
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
AN-Encoding Compiler: Building Safety-Critical Systems with Commodity Hardware
SAFECOMP '09 Proceedings of the 28th International Conference on Computer Safety, Reliability, and Security
Architecture Design for Soft Errors
Architecture Design for Soft Errors
Tolerating hardware device failures in software
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Automatic device driver synthesis with termite
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
seL4: formal verification of an OS kernel
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
NOVA: a microhypervisor-based secure virtualization architecture
Proceedings of the 5th European conference on Computer systems
CuriOS: improving reliability through operating system structure
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Faults in linux: ten years later
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Dthreads: efficient deterministic multithreading
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Who watches the watchmen? - protecting operating system reliability mechanisms
HotDep'12 Proceedings of the Eighth USENIX conference on Hot Topics in System Dependability
Back to the future: fault-tolerant live update with time-traveling state transfer
LISA'13 Proceedings of the 27th international conference on Large Installation System Administration
Toward predictable, efficient, system-level tolerance of transient faults
ACM SIGBED Review - Special Issue on the 5th Workshop on Adaptive and Reconfigurable Embedded Systems
Hi-index | 0.00 |
In modern commodity operating systems, core functionality is usually designed assuming that the underlying processor hardware always functions correctly. Shrinking hardware feature sizes break this assumption. Existing approaches to cope with these issues either use hardware functionality that is not available in commercial-off-the-shelf (COTS) systems or poses additional requirements on the software development side, making reuse of existing software hard, if not impossible. In this paper we present Romain, a framework that provides transparent redundant multithreading1 as an operating system service for hardware error detection and recovery. When applied to a standard benchmark suite, Romain requires a maximum runtime overhead of 30% for triple-modular redundancy (while in many cases remaining below 5%). Furthermore, our approach minimizes the complexity added to the operating system for the sake of replication.