A statistical framework for query translation disambiguation

  • Authors:
  • Yi Liu;Rong Jin;Joyce Y. Chai

  • Affiliations:
  • Michigan State University, MI;Michigan State University, MI;Michigan State University, MI

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Resolving ambiguity in the process of query translation is crucial to cross-language information retrieval (CLIR), given the short length of queries. This problem is even more challenging when only a bilingual dictionary is available, which is the focus of our work described here. In this paper, we will present a statistical framework for dictionary-based CLIR that estimates the translation probabilities of query words based on the monolingual word co-occurrence statistics. In addition, we will present two realizations of the proposed framework, i.e., the “maximum coherence model” and the “spectral query-translation model,” that exploit different metrics for the coherence measurement between a translation of a query word and the theme of the entire query. Compared to previous work on dictionary-based CLIR, the proposed framework is advantageous in three aspects: (1) Translation probabilities are calculated explicitly to capture the uncertainty in translating queries; (2) translations of all query words are estimated simultaneously rather than independently; and (3) the formulated problem can be solved efficiently with a unique optimal solution. Empirical studies with Chinese--English cross-language information retrieval using TREC datasets have shown that the proposed models achieve a relative 10%--50% improvement, compared to other approaches that also exploit word co-occurrence statistics for query translation disambiguation.