On saying “Enough already!” in SQL
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficient distributed algorithms to build inverted files
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Building a question answering test collection
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Effective ranking with arbitrary passages
Journal of the American Society for Information Science and Technology
Exploiting redundancy in question answering
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Top-k selection queries over relational databases: Mapping strategies and performance evaluation
ACM Transactions on Database Systems (TODS)
Indexing Flower Patent Images Using Domain Knowledge
IEEE Intelligent Systems
Supporting Incremental Join Queries on Ranked Inputs
Proceedings of the 27th International Conference on Very Large Data Bases
Quantitative evaluation of passage retrieval algorithms for question answering
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Searching XML documents via XML fragments
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Efficient region-based image retrieval
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A reliable storage management layer for distributed information retrieval systems
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Analyses for elucidating current question answering technology
Natural Language Engineering
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
The overlap problem in content-oriented XML retrieval evaluation
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Length normalization in XML retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Advances in Open Domain Question Answering
Advances in Open Domain Question Answering
Improving on-demand learning to rank through parallelism
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Hi-index | 0.00 |
We examine the problem of retrieving the top-m ranked items from a large collection, randomly distributed across an n-node system. In order to retrieve the top m overall, we must retrieve the top m from the subcollection stored on each node and merge the results. However, if we are willing to accept a small probability that one or more of the top-m items may be missed, it is possible to reduce computation time by retrieving only the top k from each node. In this paper, we demonstrate that this simple observation can be exploited in a realistic application to produce a substantial efficiency improvement without compromising the quality of the retrieved results. To support our claim, we present a statistical model that predicts the impact of the optimization. The paper is structured around a specific application~---~passage retrieval for question answering~---~but the primary results are more broadly applicable.