A Comparison of Techniques for Selecting Text Collections

Authors:
Daryl D'Souza;James Thom;Justin Zobel
Affiliations:
-;-;-
Venue:
ADC '00 Proceedings of the Australasian Database Conference
Year:
2000

Citing 0
Cited 7

Relevant document distribution estimation method for resource selection

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Collection selection for managed distributed document databases

Information Processing and Management: an International Journal
Unified utility maximization framework for resource selection

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Modeling search engine effectiveness for federated search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Two-stage statistical language models for text database selection

Information Retrieval
Mining world knowledge for analysis of search engine content

Web Intelligence and Agent Systems
Federated Search

Foundations and Trends in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Techniques for evaluating queries against a distributed text document database allow uniform access to its component collections. One such technique is to first choose a subset of collections, via a selection index. The index captures information about each collection such as which terms occur in which documents, term statistics, and collection statistics.A possible implementation of such an index is a lexicon, which maintains a complete list of terms in the database. Another approach is to partially index the database by extracting fewer terms but maintaining some information about each document. In this paper we explore three collection-ranking techniques, two based on lexicons and the other based on partial document indexes. Our experiments show that in most cases the lexicon approaches outperform the partial index approach.