Paxos replicated state machines as the basis of a high-performance data store

  • Authors:
  • William J. Bolosky;Dexter Bradshaw;Randolph B. Haagens;Norbert P. Kusters;Peng Li

  • Affiliations:
  • Microsoft Research;Microsoft;Microsoft;Microsoft;Microsoft

  • Venue:
  • Proceedings of the 8th USENIX conference on Networked systems design and implementation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Conventional wisdom holds that Paxos is too expensive to use for high-volume, high-throughput, data-intensive applications. Consequently, fault-tolerant storage systems typically rely on special hardware, semantics weaker than sequential consistency, a limited update interface (such as append-only), primary-backup replication schemes that serialize all reads through the primary, clock synchronization for correctness, or some combination thereof. We demonstrate that a Paxos-based replicated state machine implementing a storage service can achieve performance close to the limits of the underlying hardware while tolerating arbitrary machine restarts, some permanent machine or disk failures and a limited set of Byzantine faults. We also compare it with two versions of primary-backup. The replicated state machine can serve as the data store for a file system or storage array. We present a novel algorithm for ensuring read consistency without logging, along with a sketch of a proof of its correctness.