Exploiting proximity feature in statistical translation models for information retrieval

Authors:
Xinhui Tu;Jing Luo;Bo Li;Tingting He;Maofu Liu
Affiliations:
Wuhan University of Science and Technology, Wuhan, China;Wuhan University of Science and Technology, Wuhan, China;Central China Normal University, Wuhan, China;Central China Normal University, Wuhan, China;Wuhan University of Science and Technology, Wuhan, China
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 14
Cited 0

Some aspects of proximity searching in text retrieval systems

Journal of Information Science
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Title language model for information retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Term proximity scoring for ad-hoc retrieval on very large text collections

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
An exploration of proximity measures in information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical Language Models for Information Retrieval A Critical Review

Foundations and Trends in Information Retrieval
A proximity language model for information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Positional language models for information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Estimation of statistical translation models based on mutual information for ad hoc information retrieval

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
CRTER: using cross terms to enhance probabilistic information retrieval

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Axiomatic analysis of translation language model for information retrieval

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Proximity-based rocchio's model for pseudo relevance

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

A main challenge in applying translation language models to information retrieval is how to estimate the 'true' probability that a query could be generated as a translation of a document. The state-of-art methods rely on document-based word co-occurrences to estimate word-word translation probabilities. However, these methods do not take into account the proximity of co-occurrences. Intuitively, the proximity of co-occurrences can be exploited to estimate more accurate translation probabilities, since two words occur closer are more likely to be related. In this paper, we study how to explicitly incorporate proximity information into the existing translation language model, and propose a proximity-based translation language model, called TM-P, with three variants. In our TM-P models, a new concept (proximity-based word co-occurrence frequency) is introduced to model the proximity of word co-occurrences, which is then used to estimate translation probabilities. Experimental results on standard TREC collections show that our TM-P models achieve significant improvements over the state-of-the-art translation models.