Chinese OOV translation and post-translation query expansion in chinese--english cross-lingual information retrieval

  • Authors:
  • Ying Zhang;Phil Vines;Justin Zobel

  • Affiliations:
  • School of Computer Science and Information Technology, RMIT University, Melbourne, Australia;School of Computer Science and Information Technology, RMIT University, Melbourne, Australia;School of Computer Science and Information Technology, RMIT University, Melbourne, Australia

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cross-lingual information retrieval allows users to query mixed-language collections or to probe for documents written in an unfamiliar language. A major difficulty for cross-lingual information retrieval is the detection and translation of out-of-vocabulary (OOV) terms; for OOV terms in Chinese, another difficulty is segmentation. At NTCIR-4, we explored methods for translation and disambiguation for OOV terms when using a Chinese query on an English collection. We have developed a new segmentation-free technique for automatic translation of Chinese OOV terms using the web. We have also investigated the effects of distance factor and window size when using a hidden Markov model to provide disambiguation. Our experiments show these methods significantly improve effectiveness; in conjunction with our post-translation query expansion technique, effectiveness approaches that of monolingual retrieval.