Parallel database systems: the future of high performance database systems
Communications of the ACM
ACM Transactions on Computer Systems (TOCS)
Load balancing for term-distributed parallel retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
The impact of caching on search engines
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
On designing and deploying internet-scale services
LISA'07 Proceedings of the 21st conference on Large Installation System Administration Conference
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Introduction to Information Retrieval
Introduction to Information Retrieval
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
ZooKeeper: wait-free coordination for internet-scale systems
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Information Retrieval: Implementing and Evaluating Search Engines
Information Retrieval: Implementing and Evaluating Search Engines
Availability in globally distributed storage systems
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Fast data in the era of big data: Twitter's real-time related query suggestion architecture
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
WTF: the who to follow service at Twitter
Proceedings of the 22nd international conference on World Wide Web
Hi-index | 0.00 |
Low-latency, high-throughput web services are typically achieved through partitioning, replication, and caching. Although these strategies and the general design of large-scale distributed search systems are well known, the academic literature provides surprisingly few details on deployment and operational considerations in production environments. In this paper, we address this gap by sharing the distributed search architecture that underlies Twitter user search, a service for discovering relevant accounts on the popular microblogging service. Our design makes use of the principle that eliminates the distinction between failure and other anticipated service disruptions: as a result, most operational scenarios share exactly the same code path. This simplicity leads to greater robustness and fault-tolerance. Another salient feature of our architecture is its exclusive reliance on open-source software components, which makes it easier for the community to learn from our experiences and replicate our findings.