Towards robust distributed systems (abstract)
Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Dynamo: amazon's highly available key-value store
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
PNUTS: Yahoo!'s hosted data serving platform
Proceedings of the VLDB Endowment
Cassandra: structured storage system on a P2P network
Proceedings of the 28th ACM symposium on Principles of distributed computing
F1: the fault-tolerant distributed RDBMS supporting google's ad business
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Data Infrastructure at LinkedIn
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Spanner: Google's globally-distributed database
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
All aboard the Databus!: Linkedin's scalable consistent change data capture platform
Proceedings of the Third ACM Symposium on Cloud Computing
Untangling cluster management with Helix
Proceedings of the Third ACM Symposium on Cloud Computing
Hi-index | 0.00 |
Espresso is a document-oriented distributed data serving platform that has been built to address LinkedIn's requirements for a scalable, performant, source-of-truth primary store. It provides a hierarchical document model, transactional support for modifications to related documents, real-time secondary indexing, on-the-fly schema evolution and provides a timeline consistent change capture stream. This paper describes the motivation and design principles involved in building Espresso, the data model and capabilities exposed to clients, details of the replication and secondary indexing implementation and presents a set of experimental results that characterize the performance of the system along various dimensions. When we set out to build Espresso, we chose to apply best practices in industry, already published works in research and our own internal experience with different consistency models. Along the way, we built a novel generic distributed cluster management framework, a partition-aware change- capture pipeline and a high-performance inverted index implementation.