Distributed information retrieval with skewed database size distributions

  • Authors:
  • Luo Si;Jie Lu;Jamie Callan

  • Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • dg.o '03 Proceedings of the 2003 annual national conference on Digital government research
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The proliferation of government information on local area networks and the Internet creates the problem of finding information that may be distributed among many disjoint text databases (distributed information retrieval or federated search). A distributed information retrieval system is composed of three components: Resource representation, resource selection and result merging. Previous research suggested that the CORI algorithm is one of the most effective resource selection algorithms, but its effectiveness in environments containing a wide range of database sizes was not studied thoroughly. This paper shows that the CORI algorithm does not work well in environments with a skewed distribution of database sizes. We present a new resource selection algorithm based on estimating the distribution of relevant documents among the online databases. This new algorithm selects resources more accurately than the CORI algorithm, which can lead to improved document rankings.