A study of statistical models for query translation: finding a good unit of translation

  • Authors:
  • Jianfeng Gao;Jian-Yun Nie

  • Affiliations:
  • Microsoft Research;Université de Montréal

  • Venue:
  • SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a study of three statistical query translation models that use different units of translation. We begin with a review of a word-based translation model that uses co-occurrence statistics for resolving translation ambiguities. The translation selection problem is then formulated under the framework of graphic model resorting to which the modeling assumptions and limitations of the co-occurrence model are discussed, and the research of finding better translation units is motivated. Then, two other models that use larger, linguistically motivated translation units (i.e., noun phrase and dependency triple) are presented. For each model, the modeling and training methods are described in detail. All query translation models are evaluated using TREC collections. Results show that larger translation units lead to more specific models that usually achieve better translation and cross-language information retrieval results.