A fixed-size Bloom filter for searching textual documents
The Computer Journal
Optimization for dynamic inverted index maintenance
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Incremental updates of inverted lists for text document retrieval
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Fast evaluation of structured queries for information retrieval
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Query evaluation: strategies and optimizations
Information Processing and Management: an International Journal
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Summary cache: a scalable wide-area web cache sharing protocol
IEEE/ACM Transactions on Networking (TON)
The stochastic approach for link-structure analysis (SALSA) and the TKC effect
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Vector-space ranking with effective early termination
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Experiments on Adaptive Set Intersections for Text Retrieval Systems
ALENEX '01 Revised Papers from the Third International Workshop on Algorithm Engineering and Experimentation
ACM SIGIR Forum
Efficient single-pass index construction for text databases
Journal of the American Society for Information Science and Technology
Efficient query evaluation using a two-level retrieval process
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Optimization strategies for complex queries
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Simplified similarity scoring using term ranks
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Indexing time vs. query time: trade-offs in dynamic information retrieval systems
Proceedings of the 14th ACM international conference on Information and knowledge management
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Efficient online index maintenance for contiguous inverted lists
Information Processing and Management: an International Journal
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
High accuracy retrieval with multiple nested ranker
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Information Processing Letters
ACM Transactions on Information Systems (TOIS)
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Efficient document retrieval in main memory
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Latent concept expansion using markov random fields
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Automatic feature selection in the markov random field model for information retrieval
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Efficient on-line index maintenance for dynamic text collections by using dynamic balancing tree
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
On designing and deploying internet-scale services
LISA'07 Proceedings of the 21st conference on Large Installation System Administration Conference
Efficient online index construction for text databases
ACM Transactions on Database Systems (TODS)
On the false-positive rate of Bloom filters
Information Processing Letters
Answering general time sensitive queries
Proceedings of the 17th ACM conference on Information and knowledge management
Learning to Rank for Information Retrieval
Foundations and Trends in Information Retrieval
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
Improving the performance of list intersection
Proceedings of the VLDB Endowment
Leveraging temporal dynamics of document content in relevance ranking
Proceedings of the third ACM international conference on Web search and data mining
Early exit optimizations for additive machine learned ranking systems
Proceedings of the third ACM international conference on Web search and data mining
Linear time series models for term weighting in information retrieval
Journal of the American Society for Information Science and Technology
Efficient set intersection for inverted indexing
ACM Transactions on Information Systems (TOIS)
Large-scale incremental processing using distributed transactions and notifications
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
#TwitterSearch: a comparison of microblog search and web search
Proceedings of the fourth ACM international conference on Web search and data mining
Information search and retrieval in microblogs
Journal of the American Society for Information Science and Technology
Bagging gradient-boosted trees for high precision, low variance ranking models
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A cascade ranking model for efficient ranked retrieval
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Posting list intersection on multicore architectures
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Energy-price-driven query processing in multi-center web search engines
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Faster top-k document retrieval using block-max indexes
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Learning to Rank for Information Retrieval and Natural Language Processing
Learning to Rank for Information Retrieval and Natural Language Processing
Efficient and effective spam filtering and re-ranking for large web datasets
Information Retrieval
COCA filters: co-occurrence aware bloom filters
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Faster adaptive set intersections for text searching
WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
Earlybird: Real-Time Search at Twitter
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
On building a reusable Twitter corpus
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 21st ACM international conference on Information and knowledge management
Efficient and effective retrieval using selective pruning
Proceedings of the sixth ACM international conference on Web search and data mining
Hi-index | 0.00 |
The rise of social media and other forms of user-generated content have created the demand for real-time search: against a high-velocity stream of incoming documents, users desire a list of relevant results at the time the query is issued. In the context of real-time search on tweets, this work explores candidate generation in a two-stage retrieval architecture where an initial list of results is processed by a second-stage rescorer to produce the final output. We introduce Bloom filter chains, a novel extension of Bloom filters that can dynamically expand to efficiently represent an arbitrarily long and growing list of monotonically-increasing integers with a constant false positive rate. Using a collection of Bloom filter chains, a novel approximate candidate generation algorithm called BWand is able to perform both conjunctive and disjunctive retrieval. Experiments show that our algorithm is many times faster than competitive baselines and that this increased performance does not require sacrificing end-to-end effectiveness. Our results empirically characterize the trade-off space defined by output quality, query evaluation speed, and memory footprint for this particular search architecture.