GJM-2: a special case of General Jelinek-Mercer smoothing method for language modeling approach to ad hoc IR

  • Authors:
  • Guodong Ding;Bin Wang

  • Affiliations:
  • Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

  • Venue:
  • AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The language modeling approach to IR is attractive and promising because it connects the problem of retrieval with that of language model estimation. A core technique for language model estimation is smoothing, which adjusts the maximum likelihood estimator so as to correct the inaccuracy due to data sparseness. In this paper we propose a General Jelinek-Mercer method (GJM) by using a document-dependent mixture coefficient to control the influence of maximum likelihood model and the collection model. Utilizing the number of unique terms in the document to improve the accuracy of language model estimation, we further develop GJM-2 smoothing method as a special case of GJM. Experimental results show that using GJM-2 for the language modeling approach can achieve better retrieval performances than the existing three popular methods both on short and long queries.