Estimation of statistical translation models based on mutual information for ad hoc information retrieval

Authors:
Maryam Karimzadehgan;ChengXiang Zhai
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA
Venue:
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Year:
2010

Citing 30
Cited 14

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Evaluation of an inference network-based retrieval model

ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
Probabilistic models in information retrieval

The Computer Journal - Special issue on information retrieval
Concept based query expansion

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Query expansion using lexical-semantic relations

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A cooccurrence-based thesaurus and two applications to information retrieval

Information Processing and Management: an International Journal
Statistical methods for speech recognition

Statistical methods for speech recognition
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating a probabilistic model for cross-lingual information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Information Retrieval

Information Retrieval
Title language model for information retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cross-lingual relevance models

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus structure, language models, and ad hoc information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
An effective approach to document retrieval via utilizing WordNet and recognizing phrases

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Integrating word relationships into language models

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Similarity measures for tracking information flow

Proceedings of the 14th ACM international conference on Information and knowledge management
Query expansion using term relationships in language models for information retrieval

Proceedings of the 14th ACM international conference on Information and knowledge management
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Language model information retrieval with document expansion

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Retrieval models for question and answer archives

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Automatic keyphrase extraction by bridging vocabulary gap

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Markov graphic method for information retrieval

AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part II
Improving retrieval accuracy of difficult queries through generalizing negative document language models

Proceedings of the 20th ACM international conference on Information and knowledge management
Trained trigger language model for sentence retrieval in QA: bridging the vocabulary gap

Proceedings of the 20th ACM international conference on Information and knowledge management
Mining the interests of Chinese microbloggers via keyword extraction

Frontiers of Computer Science in China
A simple word trigger method for social tag suggestion

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Axiomatic analysis of translation language model for information retrieval

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
A term association translation model for naive bayes text classification

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Position-Aligned translation model for citation recommendation

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Query representation for cross-temporal information retrieval

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Statistical Translation Language Model for Twitter Search

Proceedings of the 2013 Conference on the Theory of Information Retrieval
A novel neighborhood based document smoothing model for information retrieval

Information Retrieval
Exploiting proximity feature in statistical translation models for information retrieval

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Discovering high quality answers in community question answering archives using a hierarchy of classifiers

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

As a principled approach to capturing semantic relations of words in information retrieval, statistical translation models have been shown to outperform simple document language models which rely on exact matching of words in the query and documents. A main challenge in applying translation models to ad hoc information retrieval is to estimate a translation model without training data. Existing work has relied on training on synthetic queries generated based on a document collection. However, this method is computationally expensive and does not have a good coverage of query words. In this paper, we propose an alternative way to estimate a translation model based on normalized mutual information between words, which is less computationally expensive and has better coverage of query words than the synthetic query method of estimation. We also propose to regularize estimated translation probabilities to ensure sufficient probability mass for self-translation. Experiment results show that the proposed mutual information-based estimation method is not only more efficient, but also more effective than the synthetic query-based method, and it can be combined with pseudo-relevance feedback to further improve retrieval accuracy. The results also show that the proposed regularization strategy is effective and can improve retrieval accuracy for both synthetic query-based estimation and mutual information-based estimation.