An experimental evaluation of the assumption of independence in multiversion programming
IEEE Transactions on Software Engineering
A Compiler that Increases the Fault Tolerance of Asynchronous Protocols
IEEE Transactions on Computers
Automatically increasing the fault-tolerance of distributed systems
PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
Hypervisor-based fault tolerance
ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
ACM Transactions on Computer Systems (TOCS)
What good are models and what models are good?
Distributed systems (2nd Ed.)
Self-stabilizing systems in spite of distributed control
Communications of the ACM
Practical byzantine fault tolerance and proactive recovery
ACM Transactions on Computer Systems (TOCS)
Concurrent Error Detection Using Watchdog Processors-A Survey
IEEE Transactions on Computers
From Crash Fault-Tolerance to Arbitrary-Fault Tolerance: Towards a Modular Approach
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Separating agreement from execution for byzantine fault tolerant services
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Basic Concepts and Taxonomy of Dependable and Secure Computing
IEEE Transactions on Dependable and Secure Computing
SWIFT: Software Implemented Fault Tolerance
Proceedings of the international symposium on Code generation and optimization
Farsite: federated, available, and reliable storage for an incompletely trusted environment
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Thema: Byzantine-Fault-Tolerant Middleware forWeb-Service Applications
SRDS '05 Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems
Paxos made live: an engineering perspective
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Zyzzyva: speculative byzantine fault tolerance
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
PeerReview: practical accountability for distributed systems
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Redundancy in Data Structures: Improving Software Fault Tolerance
IEEE Transactions on Software Engineering
Fault Tolerance via Diversity for Off-the-Shelf Products: A Study with SQL Database Servers
IEEE Transactions on Dependable and Secure Computing
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Nysiad: practical protocol transformation to tolerate Byzantine failures
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
An analysis of data corruption in the storage stack
ACM Transactions on Storage (TOS)
Communications of the ACM - Rural engineering development
DRAM errors in the wild: a large-scale field study
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
A realistic evaluation of memory hardware errors and software system susceptibility
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
ZooKeeper: wait-free coordination for internet-scale systems
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Zab: High-performance broadcast for primary-backup systems
DSN '11 Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems&Networks
Composable reliability for asynchronous systems
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Software encoded processing: building dependable systems with commodity hardware
SAFECOMP'07 Proceedings of the 26th international conference on Computer Safety, Reliability, and Security
Efficient Byzantine Fault-Tolerance
IEEE Transactions on Computers
Composable reliability for asynchronous systems
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Towards transparent hardening of distributed systems
Proceedings of the 9th Workshop on Hot Topics in Dependable Systems
HARDFS: hardening HDFS with selective and lightweight versioning
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
Recent failures of production systems have highlighted the importance of tolerating faults beyond crashes. The industry has so far addressed this problem by hardening crash-tolerant systems with ad hoc error detection checks, potentially overlooking critical fault scenarios. We propose a generic and principled hardening technique for Arbitrary State Corruption (ASC) faults, which specifically model the effects of realistic data corruptions on distributed processes. Hardening does not require the use of trusted components or the replication of the process over multiple physical servers. We implemented a wrapper library to transparently harden distributed processes. To exercise our library and evaluate our technique, we obtained ASC-tolerant versions of Paxos, of a subset of the ZooKeeper API, and of an eventually consistent storage by implementing crash-tolerant protocols and automatically hardening them using our library. Our evaluation shows that the throughput of our ASC-hardened state machine replication outperforms its Byzantine-tolerant counterpart by up to 70%.