Stronger semantics for low-latency geo-replicated storage

Authors:
Wyatt Lloyd;Michael J. Freedman;Michael Kaminsky;David G. Andersen
Affiliations:
Princeton University;Princeton University;Intel Labs;Carnegie Mellon University
Venue:
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Year:
2013

Citing 37
Cited 11

Linearizability: a correctness condition for concurrent objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Providing high availability using lazy replication

ACM Transactions on Computer Systems (TOCS)
Understanding the limitations of causally and totally ordered communication

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Sequential consistency versus linearizability

ACM Transactions on Computer Systems (TOCS)
Serverless network file systems

ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Flexible update propagation for weakly consistent replication

Proceedings of the sixteenth ACM symposium on Operating systems principles
The part-time parliament

ACM Transactions on Computer Systems (TOCS)
A Majority consensus approach to concurrency control for multiple copy databases

ACM Transactions on Database Systems (TODS)
Concurrency Control in Distributed Database Systems

ACM Computing Surveys (CSUR)
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Reliable Distributed Computing with the ISIS Toolkit

Reliable Distributed Computing with the ISIS Toolkit
A principle for resilient sharing of distributed resources

ICSE '76 Proceedings of the 2nd international conference on Software engineering
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
An integrated experimental environment for distributed systems and networks

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Chain replication for supporting high throughput and availability

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
PRACTI replication

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Implementing Distributed Read-Only Transactions

IEEE Transactions on Software Engineering
A Formal Model of Crash Recovery in a Distributed System

IEEE Transactions on Software Engineering
Bigtable: A Distributed Storage System for Structured Data

ACM Transactions on Computer Systems (TOCS)
PNUTS: Yahoo!'s hosted data serving platform

Proceedings of the VLDB Endowment
Sinfonia: A new paradigm for building scalable distributed systems

ACM Transactions on Computer Systems (TOCS)
FAWN: a fast array of wimpy nodes

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
A practical concurrent binary search tree

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Object storage on CRAQ: high-throughput chain replication for read-mostly workloads

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Transactional consistency and automatic management in an application data cache

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Paxos replicated state machines as the basis of a high-performance data store

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Orleans: cloud computing for everyone

Proceedings of the 2nd ACM Symposium on Cloud Computing
Scalable consistency in Scatter

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Windows Azure Storage: a highly available cloud storage service with strong consistency

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Transactional storage for geo-replicated systems

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Don't settle for eventual: scalable causal consistency for wide-area storage with COPS

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Calvin: fast distributed transactions for partitioned database systems

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Workload analysis of a large-scale key-value store

Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Granola: low-overhead distributed transaction coordination

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Spanner: Google's globally-distributed database

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Making geo-replicated systems fast as possible, consistent when necessary

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation

Eventual consistency today: limitations, extensions, and beyond

Communications of the ACM
Eventual Consistency Today: Limitations, Extensions, and Beyond

Queue - Storage
Bolt-on causal consistency

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Rethinking eventual consistency

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
HAT, not CAP: towards highly available transactions

HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Transaction chains: achieving serializability with low latency in geo-distributed storage systems

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
SPANStore: cost-effective geo-replicated storage spanning multiple cloud services

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Orbe: scalable causal consistency using dependency matrices and physical clocks

Proceedings of the 4th annual Symposium on Cloud Computing
Consistency without borders

Proceedings of the 4th annual Symposium on Cloud Computing
TAO: Facebook's distributed data store for the social graph

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference

Quantified Score

Hi-index	0.02

Visualization

Abstract

We present the first scalable, geo-replicated storage system that guarantees low latency, offers a rich data model, and provides "stronger" semantics. Namely, all client requests are satisfied in the local datacenter in which they arise; the system efficiently supports useful data model abstractions such as column families and counter columns; and clients can access data in a causally-consistent fashion with read-only and write-only transactional support, even for keys spread across many servers. The primary contributions of this work are enabling scalable causal consistency for the complex columnfamily data model, as well as novel, non-blocking algorithms for both read-only and write-only transactions. Our evaluation shows that our system, Eiger, achieves low latency (single-ms), has throughput competitive with eventually-consistent and non-transactional Cassandra (less than 7% overhead for one of Facebook's real-world workloads), and scales out to large clusters almost linearly (averaging 96% increases up to 128 server clusters).