Federated text retrieval from uncooperative overlapped collections

  • Authors:
  • Milad Shokouhi;Justin Zobel

  • Affiliations:
  • RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia

  • Venue:
  • SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In federated text retrieval systems, the query is sent to multiple collections at the same time. The results returned by collections are gathered and ranked by a central broker that presents them to the user. It is usually assumed that the collections have little overlap. However, in practice collections may share many common documents as either exact or near duplicates, potentially leading to high numbers of duplicates in the final results. Considering the natural band width restrictions and efficiency issues of federated search, sendingqueries to redundant collections leads to unnecessary costs. We propose a novel method for estimating the rate of over-lap among collections based on sampling. Then, using theestimated overlap statistics, we propose two collection selection methods that aim to maximize the number of unique relevant documents in the final results. We show experimentally that, although our estimates of overlap are not in exact, our suggested techniques can significantly improve the search effectiveness when collections overlap.