Using mutual information to resolve query translation ambiguities and query term weighting

  • Authors:
  • Myung-Gil Jang;Sung Hyon Myaeng;Se Young Park

  • Affiliations:
  • Electronics and Telecommunications Research Institute, Taejon, Korea;Chungnam National University, Taejon, Korea;Electronics and Telecommunications Research Institute, Taejon, Korea

  • Venue:
  • ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

An easy way of translating queries in one language to the other for cross-language information retrieval (IR) is to use a simple bilingual dictionary. Because of the general-purpose nature of such dictionaries, however, this simple method yields a severe translation ambiguity problem. This paper describes the degree to which this problem arises in Korean-English cross-language IR and suggests a relatively simple yet effective method for disambiguation using mutual information statistics obtained only from the target document collection. In this method, mutual information is used not only to select the best candidate but also to assign a weight to query terms in the target language. Our experimental results based on the TREC-6 collection shows that this method can achieve up to 85% of the monolingual retrieval case and 96% of the manual disambiguation case.