Fast document-at-a-time query processing using two-tier indexes

Authors:
Cristian Rossi;Edleno S. de Moura;Andre L. Carvalho;Altigran S. da Silva
Affiliations:
Federal University of Amazonas, Manaus, Brazil;Federal University of Amazonas, Manaus, Brazil;Federal University of Amazonas, Manaus, Brazil;Federal University of Amazonas, Manaus, Brazil
Venue:
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Year:
2013

Citing 13
Cited 0

Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Vector-space ranking with effective early termination

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A Vector Space Model for Automatic Indexing

A Vector Space Model for Automatic Indexing
Multi-Tier Architecture for Web Search Engines

LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
Efficient query evaluation using a two-level retrieval process

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Pruned query evaluation using pre-computed impacts

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient document retrieval in main memory

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Pruning policies for two-tiered inverted index with correctness guarantee

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
ResIn: a combination of results caching and index pruning for high-performance web search engines
Modern Information Retrieval

Modern Information Retrieval
Faster top-k document retrieval using block-max indexes

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Optimized top-k processing with global page scores on block-max indexes

Proceedings of the fifth ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present two new algorithms designed to reduce the overall time required to process top-k queries. These algorithms are based on the document-at-a-time approach and modify the best baseline we found in the literature, Blockmax WAND (BMW), to take advantage of a two-tiered index, in which the first tier is a small index containing only the higher impact entries of each inverted list. This small index is used to pre-process the query before accessing a larger index in the second tier, resulting in considerable speeding up the whole process. The first algorithm we propose, named BMW-CS, achieves higher performance, but may result in small changes in the top results provided in the final ranking. The second algorithm, named BMW-t, preserves the top results and, while slower than BMW-CS, it is faster than BMW. In our experiments, BMW-CS was more than 40 times faster than BMW when computing top 10 results, and, while it does not guarantee preserving the top results, it preserved all ranking results evaluated at this level.