Chain replication in theory and in practice

Authors:
Scott Lystig Fritchie
Affiliations:
Gemini Mobile Technologies, Inc., Minneapolis, MN, USA
Venue:
Proceedings of the 9th ACM SIGPLAN workshop on Erlang
Year:
2010

Citing 16
Cited 0

Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
The part-time parliament

ACM Transactions on Computer Systems (TOCS)
SEDA: an architecture for well-conditioned, scalable internet services

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Distributed Algorithms

Distributed Algorithms
Revisiting the Paxos Algorithm

WDAG '97 Proceedings of the 11th International Workshop on Distributed Algorithms
Fail-Stutter Fault Tolerance

HOTOS '01 Proceedings of the Eighth Workshop on Hot Topics in Operating Systems
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
CRUSH: controlled, scalable, decentralized placement of replicated data

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Chain replication for supporting high throughput and availability

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Flash: an efficient and portable web server

ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Paxos made live: an engineering perspective

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Programming distributed erlang applications: pitfalls and recipes

ERLANG '07 Proceedings of the 2007 SIGPLAN workshop on ERLANG Workshop
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Building reliable large-scale distributed systems: when theory meets practice

ACM SIGACT News
Object storage on CRAQ: high-throughput chain replication for read-mostly workloads

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Semi-formal development of a fault-tolerant leader election protocol in erlang

FATES'04 Proceedings of the 4th international conference on Formal Approaches to Software Testing

Quantified Score

Hi-index	0.03

Visualization

Abstract

When implementing a distributed storage system, using an algorithm with a formal definition and proof is a wise idea. However, translating any algorithm into effective code can be difficult because the implementation must be both correct and fast. This paper is a case study of the implementation of the chain replication protocol in a distributed key-value store called Hibari. In theory, the chain replication algorithm is quite simple and should be straightforward to implement correctly. In practice, however, there were many implementation details that had effects both profound and subtle. The Erlang community, as well as distributed systems implementors in general, can use the lessons learned with Hibari (specifically in areas of performance enhancements and failure detection) to avoid many dangers that lurk at the interface between theory and real-world computing.