Consistent and automatic replica regeneration

Authors:
Haifeng Yu;Amin Vahdat
Affiliations:
Intel Research Pittsburgh/Carnegie Mellon University;University of California, San Diego
Venue:
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Year:
2004

Citing 27
Cited 4

Reliable communication in the presence of failures

ACM Transactions on Computer Systems (TOCS)
Leases: an efficient fault-tolerant mechanism for distributed file cache consistency

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Linearizability: a correctness condition for concurrent objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
Using process groups to implement failure detection in asynchronous environments

PODC '91 Proceedings of the tenth annual ACM symposium on Principles of distributed computing
Optimal time randomized consensus—making resilient algorithms fast in practice

SODA '91 Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms
Cluster-based scalable network services

Proceedings of the sixteenth ACM symposium on Operating systems principles
The part-time parliament

ACM Transactions on Computer Systems (TOCS)
Manageability, availability and performance in Porcupine: a highly scalable, cluster-based mail service

Proceedings of the seventeenth ACM symposium on Operating systems principles
OceanStore: an architecture for global-scale persistent storage

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
The costs and limits of availability for replicated services

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Resilient overlay networks

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Wide-area cooperative storage with CFS

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Group communication specifications: a comprehensive study

ACM Computing Surveys (CSUR)
Distributed Algorithms

Distributed Algorithms
Distributed Operating Systems and Algorithms

Distributed Operating Systems and Algorithms
Disk Paxos

DISC '00 Proceedings of the 14th International Conference on Distributed Computing
RAMBO: A Reconfigurable Atomic Memory Service for Dynamic Networks

DISC '02 Proceedings of the 16th International Conference on Distributed Computing
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Farsite: federated, available, and reliable storage for an incompletely trusted environment

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Taming aggressive replication in the Pangaea wide-area file system

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Ivy: a read/write peer-to-peer file system

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
TCP Nice: a mechanism for background transfers

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Proactive recovery in a Byzantine-fault-tolerant system

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Pond: the oceanstore prototype

FAST'03 Proceedings of the 2nd USENIX conference on File and storage technologies

The SMART way to migrate replicated stateful services

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
FUSE: lightweight guaranteed distributed failure notification

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
MUREX: a mutable replica control scheme for structured peer-to-peer storage systems

GPC'06 Proceedings of the First international conference on Advances in Grid and Pervasive Computing
Agent based cloud storage system

AIC'10/BEBI'10 Proceedings of the 10th WSEAS international conference on applied informatics and communications, and 3rd WSEAS international conference on Biomedical electronics and biomedical informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reducing management costs and improving the availability of large-scale distributed systems require automatic replica regeneration, i.e., creating new replicas in response to replica failures. A major challenge to regeneration is maintaining consistency when the replica group changes. Doing so is particularly difficult across the wide area where failure detection is complicated by network congestion and node overload. In this context, this paper presents Om, the first read/write peer-to-peer wide-area storage system that achieves high availability and manageability through online automatic regeneration while still preserving consistency guarantees. We achieve these properties through the following techniques. First, by utilizing the limited view divergence property in today's Internet and by adopting the witness model, Om is able to regenerate from any single replica rather than requiring a majority quorum, at the cost of a small (10-6 in our experiments) probability of violating consistency. As a result, Om can deliver high availability with a small number of replicas, while traditional designs would significantly increase the number of replicas. Next, we distinguish failure-free reconfigurations from failure-induced ones, enabling common reconfigurations to proceed with a single round of communication. Finally, we use a lease graph among the replicas and a two-phase write protocol to optimize for reads, and reads in Om can be processed by any single replica. Experiments on PlanetLab show that consistent regeneration in Om completes in approximately 20 seconds.