Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
Hypervisor-based fault tolerance
ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Practical Byzantine fault tolerance
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
BASE: using abstraction to improve fault tolerance
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
SWIFT: Software Implemented Fault Tolerance
Proceedings of the international symposium on Code generation and optimization
A Systematic Approach to Building High Performance Software-Based CRC Generators
ISCC '05 Proceedings of the 10th IEEE Symposium on Computers and Communications
Thema: Byzantine-Fault-Tolerant Middleware forWeb-Service Applications
SRDS '05 Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems
Nysiad: practical protocol transformation to tolerate Byzantine failures
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
A realistic evaluation of memory hardware errors and software system susceptibility
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Dthreads: efficient deterministic multithreading
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Practical hardening of crash-tolerant systems
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
All about Eve: execute-verify replication for multi-core servers
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Automatically Tolerating Arbitrary Faults in Non-malicious Settings
LADC '13 Proceedings of the 2013 Sixth Latin-American Symposium on Dependable Computing
Hi-index | 0.00 |
In distributed systems, errors such as data corruption or arbitrary changes to the flow of programs might cause processes to propagate incorrect state across the system. To prevent error propagation in such systems, an efficient and effective technique is to harden processes against Arbitrary State Corruption (ASC) faults through local detection, without replication. For distributed systems designed from scratch, dealing with state corruption can be made fully transparent, but requires that developers follow a few concrete design patterns. In this paper, we discuss the problem of hardening existing code bases of distributed systems transparently. Existing systems have not been designed with ASC hardening in mind, so they do not necessarily follow required design patterns. For such systems, we focus here on both performance and number of changes to the existing code base. Using memcached as an example, we identify and discuss three areas of improvement: reducing the memory overhead, improving access to state variables, and supporting multi-threading. Our initial evaluation of memcached shows that our ASC-hardened version obtains a throughput that is roughly 76% of the throughput of stock memcached with 128-byte and 1k-byte messages.