Automatically Tolerating Arbitrary Faults in Non-malicious Settings

Authors:
Diogo Behrens;Stefan Weigert;Christof Fetzer
Affiliations:
-;-;-
Venue:
LADC '13 Proceedings of the 2013 Sixth Latin-American Symposium on Dependable Computing
Year:
2013

Citing 0
Cited 1

Towards transparent hardening of distributed systems

Proceedings of the 9th Workshop on Hot Topics in Dependable Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Arbitrary faults such as bit flips have been often observed in commodity-hardware data centers and have disrupted large services. Benign faults, such as crashes and message omissions, are nevertheless the standard assumption in practical fault-tolerant distributed systems. Algorithms tolerant to arbitrary faults are harder to understand and more expensive to deploy (requiring more machines). In this work, we introduce a non-malicious arbitrary fault model including transient and permanent arbitrary faults, such as bit flips and hardware-design errors, but no malicious faults, typically caused by security breaches. We then present a compiler-based framework that allows benign fault-tolerant algorithms to automatically tolerate arbitrary faults in non-malicious settings. Finally, we experimentally evaluate two fundamental algorithms: Paxos and leader election. At expense of CPU cycles, transformed algorithms use the same number of processes as their benign fault-tolerant counterparts, and have virtually no network overhead, while reducing the probability of failing arbitrarily by two orders of magnitude.