Approximating the top-m passages in a parallel question answering system

Authors:
Charles L. A. Clarke;Egidio L. Terra
Affiliations:
University of Waterloo, Waterloo, Canada;University of Waterloo, Waterloo, Canada
Venue:
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Year:
2004

Citing 18
Cited 1

On saying “Enough already!” in SQL

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficient distributed algorithms to build inverted files

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Building a question answering test collection

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Effective ranking with arbitrary passages

Journal of the American Society for Information Science and Technology
Exploiting redundancy in question answering

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Top-k selection queries over relational databases: Mapping strategies and performance evaluation

ACM Transactions on Database Systems (TODS)
Indexing Flower Patent Images Using Domain Knowledge

IEEE Intelligent Systems
Supporting Incremental Join Queries on Ranked Inputs

Proceedings of the 27th International Conference on Very Large Data Bases
Quantitative evaluation of passage retrieval algorithms for question answering

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Searching XML documents via XML fragments

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Efficient region-based image retrieval

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A reliable storage management layer for distributed information retrieval systems

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Analyses for elucidating current question answering technology

Natural Language Engineering
Rank-aware query optimization

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
The overlap problem in content-oriented XML retrieval evaluation

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Length normalization in XML retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Advances in Open Domain Question Answering

Advances in Open Domain Question Answering

Improving on-demand learning to rank through parallelism

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

We examine the problem of retrieving the top-m ranked items from a large collection, randomly distributed across an n-node system. In order to retrieve the top m overall, we must retrieve the top m from the subcollection stored on each node and merge the results. However, if we are willing to accept a small probability that one or more of the top-m items may be missed, it is possible to reduce computation time by retrieving only the top k from each node. In this paper, we demonstrate that this simple observation can be exploited in a realistic application to produce a substantial efficiency improvement without compromising the quality of the retrieved results. To support our claim, we present a statistical model that predicts the impact of the optimization. The paper is structured around a specific application~---~passage retrieval for question answering~---~but the primary results are more broadly applicable.