Improving query translation for cross-language information retrieval using statistical models

  • Authors:
  • Jianfeng Gao;Jian-Yun Nie;Endong Xun;Jian Zhang;Ming Zhou;Changning Huang

  • Affiliations:
  • Microsoft Research China;Univ. of Montreal, Montral, P.Q., Canada;Microsoft Research China;Tsinghua Univ., China;Tsinghua Univ., China;Microsoft Research China

  • Venue:
  • Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Dictionaries have often been used for query translation in cross-language information retrieval (CLIR). However, we are faced with the problem of translation ambiguity, i.e. multiple translations are stored in a dictionary for a word. In addition, a word-by-word query translation is not precise enough. In this paper, we explore several methods to improve the previous dictionary-based query translation. First, as many as possible, noun phrases are recognized and translated as a whole by using statistical models and phrase translation patterns. Second, the best word translations are selected based on the cohesion of the translation words. Our experimental results on TREC English-Chinese CLIR collection show that these techniques result in significant improvements over the simple dictionary approaches, and achieve even better performance than a high-quality machine translation system.