On brewing fresh espresso: LinkedIn's distributed data serving platform

  • Authors:
  • Lin Qiao;Kapil Surlaker;Shirshanka Das;Tom Quiggle;Bob Schulman;Bhaskar Ghosh;Antony Curtis;Oliver Seeliger;Zhen Zhang;Aditya Auradar;Chris Beaver;Gregory Brandt;Mihir Gandhi;Kishore Gopalakrishna;Wai Ip;Swaroop Jgadish;Shi Lu;Alexander Pachev;Aditya Ramesh;Abraham Sebastian;Rupa Shanbhag;Subbu Subramaniam;Yun Sun;Sajid Topiwala;Cuong Tran;Jemiah Westerman;David Zhang

  • Affiliations:
  • LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA;LinkedIn, Inc, Mountain View, USA

  • Venue:
  • Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Espresso is a document-oriented distributed data serving platform that has been built to address LinkedIn's requirements for a scalable, performant, source-of-truth primary store. It provides a hierarchical document model, transactional support for modifications to related documents, real-time secondary indexing, on-the-fly schema evolution and provides a timeline consistent change capture stream. This paper describes the motivation and design principles involved in building Espresso, the data model and capabilities exposed to clients, details of the replication and secondary indexing implementation and presents a set of experimental results that characterize the performance of the system along various dimensions. When we set out to build Espresso, we chose to apply best practices in industry, already published works in research and our own internal experience with different consistency models. Along the way, we built a novel generic distributed cluster management framework, a partition-aware change- capture pipeline and a high-performance inverted index implementation.