A general language model for information retrieval

  • Authors:
  • Fei Song;W. Bruce Croft

  • Affiliations:
  • Dept. of Computing and Info. Science, University of Guelph, Guelph, Ontario, Canada N1G 2W1;Dept. of Computer Science, University of Massachusetts, Amherst, Massachusetts

  • Venue:
  • Proceedings of the eighth international conference on Information and knowledge management
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Statistical language modeling has been successfully used for speech recognition, part-of-speech tagging, and syntactic parsing. Recently, it has also been applied to information retrieval. According to this new paradigm, each document is viewed as a language sample, and a query as a generation process. The retrieved documents are ranked based on the probabilities of producing a query from the corresponding language models of these documents. In this paper, we will present a new language model for information retrieval, which is based on a range of data smoothing techniques, including the Good-Turning estimate, curve-fitting functions, and model combinations. Our model is conceptually simple and intuitive, and can be easily extended to incorporate probabilities of phrases such as word pairs and word triples. The experiments with the Wall Street Journal and TREC4 data sets showed that the performance of our model is comparable to that of INQUERY and better than that of another language model for information retrieval. In particular, word pairs are shown to be useful in improving the retrieval performance.