Exploiting proximity feature in bigram language model for information retrieval

  • Authors:
  • Seung-Hoon Na;Jungi Kim;In-Su Kang;Jong-Hyeok Lee

  • Affiliations:
  • Pohang University of Science and Technology (POSTECH), Pohang, South Korea;Pohang University of Science and Technology (POSTECH), Pohang, South Korea;Kyungsung University, Pusan, South Korea;Pohang University of Science and Technology (POSTECH), Pohang, South Korea

  • Venue:
  • Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Language modeling approaches have been effectively dealing with the dependency among query terms based on N-gram such as bigram or trigram models. However, bigram language models suffer from adjacency-sparseness problem which means that dependent terms are not always adjacent in documents, but can be far from each other, sometimes with distance of a few sentences in a document. To resolve the adjacency-sparseness problem, this paper proposes a new type of bigram language model by explicitly incorporating the proximity feature between two adjacent terms in a query. Experimental results on three test collections show that the proposed bigram language model significantly improves previous bigram model as well as Tao's approach, the state-of-art method for proximity-based method.