Towards transparent hardening of distributed systems

Authors:
Diogo Behrens;Christof Fetzer;Flavio P. Junqueira;Marco Serafini
Affiliations:
TU Dresden, Dresden, Germany;TU Dresden, Dresden, Germany;Microsoft Research, Cambridge, UK;Qatar Computing Research Institute, Doha, Qatar
Venue:
Proceedings of the 9th Workshop on Hot Topics in Dependable Systems
Year:
2013

Citing 14
Cited 0

Software transactional memory

Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
Hypervisor-based fault tolerance

ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Practical Byzantine fault tolerance

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
BASE: using abstraction to improve fault tolerance

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
SWIFT: Software Implemented Fault Tolerance

Proceedings of the international symposium on Code generation and optimization
A Systematic Approach to Building High Performance Software-Based CRC Generators

ISCC '05 Proceedings of the 10th IEEE Symposium on Computers and Communications
Thema: Byzantine-Fault-Tolerant Middleware forWeb-Service Applications

SRDS '05 Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems
Nysiad: practical protocol transformation to tolerate Byzantine failures

NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
A realistic evaluation of memory hardware errors and software system susceptibility

USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Dthreads: efficient deterministic multithreading

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Cosmic rays don't strike twice: understanding the nature of DRAM errors and the implications for system design

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Practical hardening of crash-tolerant systems

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
All about Eve: execute-verify replication for multi-core servers

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Automatically Tolerating Arbitrary Faults in Non-malicious Settings

LADC '13 Proceedings of the 2013 Sixth Latin-American Symposium on Dependable Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In distributed systems, errors such as data corruption or arbitrary changes to the flow of programs might cause processes to propagate incorrect state across the system. To prevent error propagation in such systems, an efficient and effective technique is to harden processes against Arbitrary State Corruption (ASC) faults through local detection, without replication. For distributed systems designed from scratch, dealing with state corruption can be made fully transparent, but requires that developers follow a few concrete design patterns. In this paper, we discuss the problem of hardening existing code bases of distributed systems transparently. Existing systems have not been designed with ASC hardening in mind, so they do not necessarily follow required design patterns. For such systems, we focus here on both performance and number of changes to the existing code base. Using memcached as an example, we identify and discuss three areas of improvement: reducing the memory overhead, improving access to state variables, and supporting multi-threading. Our initial evaluation of memcached shows that our ASC-hardened version obtains a throughput that is roughly 76% of the throughput of stock memcached with 128-byte and 1k-byte messages.