Practical hardening of crash-tolerant systems

  • Authors:
  • Miguel Correia;Daniel Gómez Ferro;Flavio P. Junqueira;Marco Serafini

  • Affiliations:
  • IST-UTL, INESC-ID, Lisbon, Portugal;Yahoo! Research, Barcelona, Spain;Yahoo! Research, Barcelona, Spain;Yahoo! Research, Barcelona, Spain

  • Venue:
  • USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent failures of production systems have highlighted the importance of tolerating faults beyond crashes. The industry has so far addressed this problem by hardening crash-tolerant systems with ad hoc error detection checks, potentially overlooking critical fault scenarios. We propose a generic and principled hardening technique for Arbitrary State Corruption (ASC) faults, which specifically model the effects of realistic data corruptions on distributed processes. Hardening does not require the use of trusted components or the replication of the process over multiple physical servers. We implemented a wrapper library to transparently harden distributed processes. To exercise our library and evaluate our technique, we obtained ASC-tolerant versions of Paxos, of a subset of the ZooKeeper API, and of an eventually consistent storage by implementing crash-tolerant protocols and automatically hardening them using our library. Our evaluation shows that the throughput of our ASC-hardened state machine replication outperforms its Byzantine-tolerant counterpart by up to 70%.