On brewing fresh espresso: LinkedIn's distributed data serving platform

Authors:
Lin Qiao;Kapil Surlaker;Shirshanka Das;Tom Quiggle;Bob Schulman;Bhaskar Ghosh;Antony Curtis;Oliver Seeliger;Zhen Zhang;Aditya Auradar;Chris Beaver;Gregory Brandt;Mihir Gandhi;Kishore Gopalakrishna;Wai Ip;Swaroop Jgadish;Shi Lu;Alexander Pachev;Aditya Ramesh;Abraham Sebastian;Rupa Shanbhag;Subbu Subramaniam;Yun Sun;Sajid Topiwala;Cuong Tran;Jemiah Westerman;David Zhang
Affiliations:
LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA
Venue:
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Year:
2013

Citing 10
Cited 0

Towards robust distributed systems (abstract)

Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
PNUTS: Yahoo!'s hosted data serving platform

Proceedings of the VLDB Endowment
Cassandra: structured storage system on a P2P network

Proceedings of the 28th ACM symposium on Principles of distributed computing
F1: the fault-tolerant distributed RDBMS supporting google's ad business

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Data Infrastructure at LinkedIn

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Spanner: Google's globally-distributed database

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
All aboard the Databus!: Linkedin's scalable consistent change data capture platform

Proceedings of the Third ACM Symposium on Cloud Computing
Untangling cluster management with Helix

Proceedings of the Third ACM Symposium on Cloud Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Espresso is a document-oriented distributed data serving platform that has been built to address LinkedIn's requirements for a scalable, performant, source-of-truth primary store. It provides a hierarchical document model, transactional support for modifications to related documents, real-time secondary indexing, on-the-fly schema evolution and provides a timeline consistent change capture stream. This paper describes the motivation and design principles involved in building Espresso, the data model and capabilities exposed to clients, details of the replication and secondary indexing implementation and presents a set of experimental results that characterize the performance of the system along various dimensions. When we set out to build Espresso, we chose to apply best practices in industry, already published works in research and our own internal experience with different consistency models. Along the way, we built a novel generic distributed cluster management framework, a partition-aware change- capture pipeline and a high-performance inverted index implementation.