Interpolation search—a log logN search
Communications of the ACM
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Understanding MySQL Internals
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Dynamo: amazon's highly available key-value store
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Efficient bulk insertion into a distributed ordered table
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
PNUTS: Yahoo!'s hosted data serving platform
Proceedings of the VLDB Endowment
Distributed indexing of web scale datasets for the cloud
Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
Benchmarking cloud serving systems with YCSB
Proceedings of the 1st ACM symposium on Cloud computing
Parallel bulk insertion for large-scale analytics applications
Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware
Distributed indexing for semantic search
Proceedings of the 3rd International Semantic Search Workshop
The Hadoop Distributed File System
MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
A batch of PNUTS: experiences connecting cloud batch and serving systems
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
LazyBase: trading freshness for performance in a scalable database
Proceedings of the 7th ACM european conference on Computer Systems
Avatara: OLAP for web-scale analytics products
Proceedings of the VLDB Endowment
Metaphor: a system for related search recommendations
Proceedings of the 21st ACM international conference on Information and knowledge management
ElastMan: autonomic elasticity manager for cloud-based key-value stores
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
The big data ecosystem at LinkedIn
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
DBalancer: distributed load balancing for NoSQL data-stores
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Large-scale social recommender systems: challenges and opportunities
Proceedings of the 22nd international conference on World Wide Web companion
ElastMan: elasticity manager for elastic key-value stores in the cloud
Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
Hi-index | 0.00 |
Current serving systems lack the ability to bulk load massive immutable data sets without affecting serving performance. The performance degradation is largely due to index creation and modification as CPU and memory resources are shared with request serving. We have extended Project Voldemort, a general-purpose distributed storage and serving system inspired by Amazon's Dynamo, to support bulk loading terabytes of read-only data. This extension constructs the index offline, by leveraging the fault tolerance and parallelism of Hadoop. Compared to MySQL, our compact storage format and data deployment pipeline scales to twice the request throughput while maintaining sub 5 ms median latency. At LinkedIn, the largest professional social network, this system has been running in production for more than 2 years and serves many of the data-intensive social features on the site.