Optimization for dynamic inverted index maintenance
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Incremental updates of inverted lists for text document retrieval
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
Dynamic Storage Allocation: A Survey and Critical Review
IWMM '95 Proceedings of the International Workshop on Memory Management
Fast Incremental Indexing for Full-Text Information Retrieval
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Efficient single-pass index construction for text databases
Journal of the American Society for Information Science and Technology
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Performance of compressed inverted list caching in search engines
Proceedings of the 17th international conference on World Wide Web
Efficient online index construction for text databases
ACM Transactions on Database Systems (TODS)
Inverted index compression and query processing with optimized document ordering
Proceedings of the 18th international conference on World wide web
Low-cost management of inverted files for online full-text search
Proceedings of the 18th ACM conference on Information and knowledge management
Efficient set intersection for inverted indexing
ACM Transactions on Information Systems (TOIS)
Large-scale incremental processing using distributed transactions and notifications
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Earlybird: Real-Time Search at Twitter
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Hi-index | 0.00 |
We explore a real-time Twitter search application where tweets are arriving at a rate of several thousands per second. Real-time search demands that they be indexed and searchable immediately, which leads to a number of implementation challenges. In this paper, we focus on one aspect: dynamic postings allocation policies for index structures that are completely held in main memory. The core issue can be characterized as a "Goldilocks Problem". Because memory remains today a scare resource, an allocation policy that is too aggressive leads to inefficient utilization, while a policy that is too conservative is slow and leads to fragmented postings lists. We present a dynamic postings allocation policy that allocates memory in increasingly-larger "slices" from a small number of large, fixed pools of memory. With an analytical model and experiments, we explore different settings that balance time (query evaluation speed) and space (memory utilization).