Consistent and automatic replica regeneration

  • Authors:
  • Haifeng Yu;Amin Vahdat

  • Affiliations:
  • Intel Research Pittsburgh/Carnegie Mellon University;University of California, San Diego

  • Venue:
  • NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reducing management costs and improving the availability of large-scale distributed systems require automatic replica regeneration, i.e., creating new replicas in response to replica failures. A major challenge to regeneration is maintaining consistency when the replica group changes. Doing so is particularly difficult across the wide area where failure detection is complicated by network congestion and node overload. In this context, this paper presents Om, the first read/write peer-to-peer wide-area storage system that achieves high availability and manageability through online automatic regeneration while still preserving consistency guarantees. We achieve these properties through the following techniques. First, by utilizing the limited view divergence property in today's Internet and by adopting the witness model, Om is able to regenerate from any single replica rather than requiring a majority quorum, at the cost of a small (10-6 in our experiments) probability of violating consistency. As a result, Om can deliver high availability with a small number of replicas, while traditional designs would significantly increase the number of replicas. Next, we distinguish failure-free reconfigurations from failure-induced ones, enabling common reconfigurations to proceed with a single round of communication. Finally, we use a lease graph among the replicas and a two-phase write protocol to optimize for reads, and reads in Om can be processed by any single replica. Experiments on PlanetLab show that consistent regeneration in Om completes in approximately 20 seconds.