Using Paxos to build a scalable, consistent, and highly available datastore

Authors:
Jun Rao;Eugene J. Shekita;Sandeep Tata
Affiliations:
LinkedIn Corporation;IBM Almaden Research Center;IBM Almaden Research Center
Venue:
Proceedings of the VLDB Endowment
Year:
2011

Citing 22
Cited 14

The part-time parliament

ACM Transactions on Computer Systems (TOCS)
Reaching Agreement in the Presence of Faults

Journal of the ACM (JACM)
Towards robust distributed systems (abstract)

Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
Nonblocking commit protocols

SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
Implementation techniques for main memory database systems

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Chained Declustering: A New Availability Strategy for Multiprocessor Database Machines

Proceedings of the Sixth International Conference on Data Engineering
Don't Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Ganymed: scalable replication for transactional web applications

Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware
Tashkent: uniting durability with transaction ordering for high-performance scalable database replication

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Paxos made live: an engineering perspective

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
An analysis of data corruption in the storage stack

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Middleware-based database replication: the gaps between theory and practice

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
PNUTS: Yahoo!'s hosted data serving platform

Proceedings of the VLDB Endowment
Vertical paxos and primary-backup replication

Proceedings of the 28th ACM symposium on Principles of distributed computing
GFS: Evolution on Fast-forward

Queue - File Systems
Sinfonia: A new paradigm for building scalable distributed systems

ACM Transactions on Computer Systems (TOCS)
FAWN: a fast array of wimpy nodes

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Extreme scale with full SQL language support in microsoft SQL Azure

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
ZooKeeper: wait-free coordination for internet-scale systems

USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference

Scalable real time data management for smart grid

Proceedings of the Middleware 2011 Industry Track Workshop
Calvin: fast distributed transactions for partitioned database systems

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Walnut: a unified cloud object store

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
HyperDex: a distributed, searchable key-value store

Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Serializability, not serial: concurrency control and availability in multi-datacenter datastores

Proceedings of the VLDB Endowment
HyperDex: a distributed, searchable key-value store

ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
Pollux: towards scalable distributed real-time search on microblogs

Proceedings of the 16th International Conference on Extending Database Technology
Photon: fault-tolerant and scalable joining of continuous data streams

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
DAX: a widely distributed multitenant storage service for DBMS hosting

Proceedings of the VLDB Endowment
OSIRIS-SR: a scalable yet reliable distributed workflow execution engine

Proceedings of the 2nd ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
Archiving the relaxed consistency web

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
CATS: a linearizable and self-organizing key-value store

Proceedings of the 4th annual Symposium on Cloud Computing
On the efficiency of durable state machine replication

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Scalable transactions across heterogeneous NoSQL key-value data stores

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Spinnaker is an experimental datastore that is designed to run on a large cluster of commodity servers in a single datacenter. It features key-based range partitioning, 3-way replication, and a transactional get-put API with the option to choose either strong or timeline consistency on reads. This paper describes Spinnaker's Paxos-based replication protocol. The use of Paxos ensures that a data partition in Spinnaker will be available for reads and writes as long a majority of its replicas are alive. Unlike traditional master-slave replication, this is true regardless of the failure sequence that occurs. We show that Paxos replication can be competitive with alternatives that provide weaker consistency guarantees. Compared to an eventually consistent datastore, we show that Spinnaker can be as fast or even faster on reads and only 5% to 10% slower on writes.