Metasearch. Properties of Common Documents Distributions

  • Authors:
  • Nikolai Buzikashvili

  • Affiliations:
  • -

  • Venue:
  • PAKM '02 Proceedings of the 4th International Conference on Practical Aspects of Knowledge Management
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The effectiveness of metasearch data fusion procedures depends crucially on the properties of common documents distributions. Because we usually know neither how different search engines assign relevance scores nor the similarity of these assignments, common documents of the individual ranked lists are the only base of combining search results. So it is very important to study the properties of common documents distributions. One of these properties is the Overlap Property (OP) of documents retrieved by different search engines. According to OP, the overlap between the relevant documents is usually greater than the overlap between non-relevant ones. Although OP was repeatedly observed and discussed, no theoretical explanation of this empirical property was elaborated. This paper considers formal research of properties of the common documents distributions. In particular, sufficient and necessary condition of OP is elaborated and it is proved that OP should take place practically under arbitrary circumstances.