On building a reusable Twitter corpus
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Swimming against the streamz: search and analytics over the enterprise activity stream
Proceedings of the 21st ACM international conference on Information and knowledge management
Proceedings of the 21st ACM international conference on Information and knowledge management
An in-browser microblog ranking engine
ER'12 Proceedings of the 2012 international conference on Advances in Conceptual Modeling
Pollux: towards scalable distributed real-time search on microblogs
Proceedings of the 16th International Conference on Extending Database Technology
Using Pregel-like Large Scale Graph Processing Frameworks for Social Network Analysis
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Fast data in the era of big data: Twitter's real-time related query suggestion architecture
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Towards automatic assessment of the social media impact of news content
Proceedings of the 22nd international conference on World Wide Web companion
Dynamic memory allocation policies for postings in real-time Twitter search
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast candidate generation for real-time tweet search with bloom filter chains
ACM Transactions on Information Systems (TOIS)
Proceedings of the Seventh Workshop on Programming Languages and Operating Systems
Mobility and social networking: a data management perspective
Proceedings of the VLDB Endowment
Document vector representations for feature extraction in multi-stage document ranking
Information Retrieval
Hi-index | 0.00 |
The web today is increasingly characterized by social and real-time signals, which we believe represent two frontiers in information retrieval. In this paper, we present Early bird, the core retrieval engine that powers Twitter's real-time search service. Although Early bird builds and maintains inverted indexes like nearly all modern retrieval engines, its index structures differ from those built to support traditional web search. We describe these differences and present the rationale behind our design. A key requirement of real-time search is the ability to ingest content rapidly and make it searchable immediately, while concurrently supporting low-latency, high-throughput query evaluation. These demands are met with a single-writer, multiple-reader concurrency model and the targeted use of memory barriers. Early bird represents a point in the design space of real-time search engines that has worked well for Twitter's needs. By sharing our experiences, we hope to spur additional interest and innovation in this exciting space.