Issues in pre- and post-translation document expansion: untranslatable cognates and missegmented words

  • Authors:
  • Gina-Anne Levow

  • Affiliations:
  • University of Chicago, Chicago, IL

  • Venue:
  • AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Query expansion by pseudo-relevance feedback is a well-established technique in both mono- and cross- lingual information retrieval, enriching and disambiguating the typically terse queries provided by searchers. Comparable document-side expansion is a relatively more recent development motivated by error-prone transcription and translation processes in spoken document and cross-language retrieval. In the cross-language case, one can perform expansion before translation, after translation, and at both points. We investigate the relative impact of pre- and post- translation document expansion for cross-language spoken document retrieval in Mandarin Chinese. We find that post-translation expansion yields a highly significant improvement in retrieval effectiveness, while improvements due to pre-translation expansion alone or in combination do not reach significance. We identify two key factors of segmentation and translation in Chinese orthography that limit the effectiveness of pre-translation expansion in the Chinese-English case, while post-translation expansion yields its full benefit.