Relevance weighting using within-document term statistics

  • Authors:
  • Kai Hui;Ben He;Tiejian Luo;Bin Wang

  • Affiliations:
  • Graduate University of Chinese Academy of Sciences, Beijing, China;Graduate University of Chinese Academy of Sciences, Beijing, China;Graduate University of Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Beijing, China

  • Venue:
  • Proceedings of the 20th ACM international conference on Information and knowledge management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the rapid development of the information technology, there exists the difficulty in deploying state-of-the-art retrieval models in environments such as peer-to-peer networks and pervasive computing, where it is expensive or even infeasible to maintain the global statistics. To this end, this paper presents an investigation in the validity of different statistical assumptions of term distributions. Based on the findings in this investigation, a variety of weighting models, called NG (standing for "no global statistics") models, are derived from the Divergence from Randomness framework, in which only the within-document statistics are used in the relevance weighting. Compared to the state-of-the-art weighting models in extensive experiments on various standard TREC test collections, our proposed NG models can provide acceptable retrieval performance in ad-hoc search, without the use of global statistics.