Consistency and fault tolerance for erasure-coded distributed storage systems

  • Authors:
  • Kathrin Peter;Alexander Reinefeld

  • Affiliations:
  • Humboldt-Universität zu Berlin & Zuse-Institut Berlin, Berlin, Germany;Humboldt-Universität zu Berlin & Zuse-Institut Berlin, Berlin, Germany

  • Venue:
  • Proceedings of the fifth international workshop on Data-Intensive Distributed Computing Date
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

One challenge in applying erasure codes (or error-correcting codes) to distributed storage systems is to maintain consistency between data and redundancy blocks in the face of crashing servers. We present two access protocols that provide sequential consistency and maximum distance separable fault tolerance at the same time. The protocols use sequence numbers to recover a consistent version in the presence of failures or partial writes. The first (pessimistic) PSW protocol uses a master per stripe to execute updates in sequence. The second (optimistic) OCW protocol allows concurrent writes to blocks in the same stripe to happen in parallel at the cost of additional buffer space. We present empirical performance results for PSW and OCW and compare them to other protocols. Our results show that OCW is as fast as simple replication while providing better fault tolerance and/or reduced storage overhead. This demonstrates that erasure coding can be used as a space-efficient alternative to replication in distributed storage systems.