Leveraging sharding in the design of scalable replication protocols

Authors:
Hussam Abu-Libdeh;Robbert van Renesse;Ymir Vigfusson
Affiliations:
Cornell University;Cornell University;Reykjavik University
Venue:
Proceedings of the 4th annual Symposium on Cloud Computing
Year:
2013

Citing 34
Cited 0

Consistency in a partitioned network: a survey

ACM Computing Surveys (CSUR)
A quorum-consensus replication method for abstract data types

ACM Transactions on Computer Systems (TOCS)
Exploiting virtual synchrony in distributed systems

SOSP '87 Proceedings of the eleventh ACM Symposium on Operating systems principles
Leases: an efficient fault-tolerant mechanism for distributed file cache consistency

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Linearizability: a correctness condition for concurrent objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
Replica control in distributed systems: as asynchronous approach

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Sharing memory robustly in message-passing systems

Journal of the ACM (JACM)
Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
The development of Erlang

ICFP '97 Proceedings of the second ACM SIGPLAN international conference on Functional programming
The part-time parliament

ACM Transactions on Computer Systems (TOCS)
The primary-backup approach

Distributed systems (2nd Ed.)
Fail-stop processors: an approach to designing fault-tolerant computing systems

ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
The costs and limits of availability for replicated services

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
A principle for resilient sharing of distributed resources

ICSE '76 Proceedings of the 2nd international conference on Software engineering
Are quorums an alternative for data replication?

ACM Transactions on Database Systems (TODS)
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Cheap Paxos

DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
The design of a robust peer-to-peer system

EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
Chain replication for supporting high throughput and availability

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
The Chubby lock service for loosely-coupled distributed systems

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Eventually Consistent

Queue - Scalable Web Services
A simple totally ordered broadcast protocol

LADIS '08 Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware
PADS: a policy architecture for distributed storage systems

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Vertical paxos and primary-backup replication

Proceedings of the 28th ACM symposium on Principles of distributed computing
Cassandra: a decentralized structured storage system

ACM SIGOPS Operating Systems Review
ZooKeeper: wait-free coordination for internet-scale systems

USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
The Hadoop Distributed File System

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Scalable consistency in Scatter

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Don't settle for eventual: scalable causal consistency for wide-area storage with COPS

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Spanner: Google's globally-distributed database

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Eventual consistency today: limitations, extensions, and beyond

Communications of the ACM

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most if not all datacenter services use sharding and replication for scalability and reliability. Shards are more-or-less independent of one another and individually replicated. In this paper, we challenge this design philosophy and present a replication protocol where the shards interact with one another: A protocol running within shards ensures linearizable consistency, while the shards interact in order to improve availability. We provide a specification for the protocol, prove its safety, analyze its liveness and availability properties, and evaluate a working implementation.