Term necessity prediction

  • Authors:
  • Le Zhao;Jamie Callan

  • Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA

  • Venue:
  • CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The probability that a term appears in relevant documents (P(t | R)) is a fundamental quantity in several probabilistic retrieval models, however it is difficult to estimate without relevance judgments or a relevance model. We call this value term necessity because it measures the percentage of relevant documents retrieved by the term - how necessary a term's occurrence is to document relevance. Prior research typically either set this probability to a constant, or estimated it based on the term's inverse document frequency, neither of which was very effective. This paper identifies several factors that affect term necessity, for example, a term's topic centrality, synonymy and abstractness. It develops term- and query-dependent features for each factor that enable supervised learning of a predictive model of term necessity from training data. Experiments with two popular retrieval models and 6 standard datasets demonstrate that using predicted term necessity estimates as user term weights of the original query terms leads to significant improvements in retrieval accuracy.