Distributed information retrieval with skewed database size distributions

Authors:
Luo Si;Jie Lu;Jamie Callan
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
dg.o '03 Proceedings of the 2003 annual national conference on Digital government research
Year:
2003

Citing 6
Cited 2

TREC and TIPSTER experiments with INQUERY

TREC-2 Proceedings of the second conference on Text retrieval conference
Comparing the performance of database selection algorithms

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
GlOSS: text-source discovery over the Internet

ACM Transactions on Database Systems (TODS)
Server selection on the World Wide Web

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Using sampled data and regression to merge search engine results

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Relevant document distribution estimation method for resource selection

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

Does pseudo-relevance feedback improve distributed information retrieval systems?

Information Processing and Management: an International Journal
A Task-Based Evaluation of an Aggregated Search Interface

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The proliferation of government information on local area networks and the Internet creates the problem of finding information that may be distributed among many disjoint text databases (distributed information retrieval or federated search). A distributed information retrieval system is composed of three components: Resource representation, resource selection and result merging. Previous research suggested that the CORI algorithm is one of the most effective resource selection algorithms, but its effectiveness in environments containing a wide range of database sizes was not studied thoroughly. This paper shows that the CORI algorithm does not work well in environments with a skewed distribution of database sizes. We present a new resource selection algorithm based on estimating the distribution of relevant documents among the online databases. This new algorithm selects resources more accurately than the CORI algorithm, which can lead to improved document rankings.