A fixed-size Bloom filter for searching textual documents
The Computer Journal
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Experiments on Adaptive Set Intersections for Text Retrieval Systems
ALENEX '01 Revised Papers from the Third International Workshop on Algorithm Engineering and Experimentation
Efficient query evaluation using a two-level retrieval process
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Super-Scalar RAM-CPU Cache Compression
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Automatic feature selection in the markov random field model for information retrieval
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
On the false-positive rate of Bloom filters
Information Processing Letters
Improving the performance of list intersection
Proceedings of the VLDB Endowment
Bagging gradient-boosted trees for high precision, low variance ranking models
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Posting list intersection on multicore architectures
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Learning to Rank for Information Retrieval and Natural Language Processing
Learning to Rank for Information Retrieval and Natural Language Processing
Faster adaptive set intersections for text searching
WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
Earlybird: Real-Time Search at Twitter
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Fast candidate generation for real-time tweet search with bloom filter chains
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
Most modern web search engines employ a two-phase ranking strategy: a candidate list of documents is generated using a "cheap" but low-quality scoring function, which is then reranked by an "expensive" but high-quality method (usually machine-learned). This paper focuses on the problem of candidate generation for conjunctive query processing in this context. We describe and evaluate a fast, approximate postings list intersection algorithms based on Bloom filters. Due to the power of modern learning-to-rank techniques and emphasis on early precision, significant speedups can be achieved without loss of end-to-end retrieval effectiveness. Explorations reveal a rich design space where effectiveness and efficiency can be balanced in response to specific hardware configurations and application scenarios.