Efficient top-k document retrieval using a term-document binary matrix

Authors:
Etsuro Fujita;Keizo Oyama
Affiliations:
The Graduate University for Advanced Studies (SOKENDAI), Tokyo, Japan;National Institute of Informatics, Tokyo, Japan
Venue:
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Year:
2011

Citing 12
Cited 0

Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Optimization of inverted vector searches

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Combining fuzzy information: an overview

ACM SIGMOD Record
Three-level caching for efficient query processing in large Web search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
Pruned query evaluation using pre-computed impacts

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
IO-Top-k: index-access optimized top-k query processing

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Performance of compressed inverted list caching in search engines

Proceedings of the 17th international conference on World Wide Web
Introduction to Information Retrieval

Introduction to Information Retrieval
Understanding the relationship between searchers' queries and information goals

Proceedings of the 17th ACM conference on Information and knowledge management
Top-k aggregation using intersections of ranked inputs

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Efficient text proximity search

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current web search engines perform well for "navigational queries." However, due to their use of simple conjunctive Boolean filters, such engines perform poorly for "informational queries." Informational queries would be better handled by a web search engine using an informational retrieval model along with a combination of enhancement techniques such as query expansion and relevance feedback, and the realization of such a engine requires a method to prosess the model efficiently. In this paper, we describe a novel extension of an existing top-k query processing technique. We add a simple data structure called a "term-document binary matrix," resulting in more efficient evaluation of top-k queries even when the queries have been expanded. We show on the basis of experimental evaluation using the TREC GOV2 data set and expanded versions of the evaluation queries attached to this data set that the expanded technique achieves significant performance gains over existing techniques.