Exploiting proximity feature in statistical translation models for information retrieval

  • Authors:
  • Xinhui Tu;Jing Luo;Bo Li;Tingting He;Maofu Liu

  • Affiliations:
  • Wuhan University of Science and Technology, Wuhan, China;Wuhan University of Science and Technology, Wuhan, China;Central China Normal University, Wuhan, China;Central China Normal University, Wuhan, China;Wuhan University of Science and Technology, Wuhan, China

  • Venue:
  • Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

A main challenge in applying translation language models to information retrieval is how to estimate the 'true' probability that a query could be generated as a translation of a document. The state-of-art methods rely on document-based word co-occurrences to estimate word-word translation probabilities. However, these methods do not take into account the proximity of co-occurrences. Intuitively, the proximity of co-occurrences can be exploited to estimate more accurate translation probabilities, since two words occur closer are more likely to be related. In this paper, we study how to explicitly incorporate proximity information into the existing translation language model, and propose a proximity-based translation language model, called TM-P, with three variants. In our TM-P models, a new concept (proximity-based word co-occurrence frequency) is introduced to model the proximity of word co-occurrences, which is then used to estimate translation probabilities. Experimental results on standard TREC collections show that our TM-P models achieve significant improvements over the state-of-the-art translation models.