Relevance weighting using within-document term statistics

Authors:
Kai Hui;Ben He;Tiejian Luo;Bin Wang
Affiliations:
Graduate University of Chinese Academy of Sciences, Beijing, China;Graduate University of Chinese Academy of Sciences, Beijing, China;Graduate University of Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Beijing, China
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 9
Cited 0

Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Dissemination of collection wide information in a distributed information retrieval system

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic models of information retrieval based on measuring the divergence from randomness

ACM Transactions on Information Systems (TOIS)
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Global term weights in distributed environments

Information Processing and Management: an International Journal
Introduction to Information Retrieval

Introduction to Information Retrieval
Statistical Language Models for Information Retrieval A Critical Review

Foundations and Trends in Information Retrieval
A statistical approach to mechanized encoding and searching of literary information

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the rapid development of the information technology, there exists the difficulty in deploying state-of-the-art retrieval models in environments such as peer-to-peer networks and pervasive computing, where it is expensive or even infeasible to maintain the global statistics. To this end, this paper presents an investigation in the validity of different statistical assumptions of term distributions. Based on the findings in this investigation, a variety of weighting models, called NG (standing for "no global statistics") models, are derived from the Divergence from Randomness framework, in which only the within-document statistics are used in the relevance weighting. Compared to the state-of-the-art weighting models in extensive experiments on various standard TREC test collections, our proposed NG models can provide acceptable retrieval performance in ad-hoc search, without the use of global statistics.