Smoothing document language models with probabilistic term count propagation

  • Authors:
  • Azadeh Shakery;Chengxiang Zhai

  • Affiliations:
  • Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, USA 61801;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, USA 61801

  • Venue:
  • Information Retrieval
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Smoothing of document language models is critical in language modeling approaches to information retrieval. In this paper, we present a novel way of smoothing document language models based on propagating term counts probabilistically in a graph of documents. A key difference between our approach and previous approaches is that our smoothing algorithm can iteratively propagate counts and achieve smoothing with remotely related documents. Evaluation results on several TREC data sets show that the proposed method significantly outperforms the simple collection-based smoothing method. Compared with those other smoothing methods that also exploit local corpus structures, our method is especially effective in improving precision in top-ranked documents through "filling in" missing query terms in relevant documents, which is attractive since most users only pay attention to the top-ranked documents in search engine applications.