Comparing cross-language query expansion techniques by degrading translation resources

  • Authors:
  • Paul McNamee;James Mayfield

  • Affiliations:
  • The Johns Hopkins University Applied Physics Laboratory, Laurel, MD;The Johns Hopkins University Applied Physics Laboratory, Laurel, MD

  • Venue:
  • SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The quality of translation resources is arguably the most important factor affecting the performance of a cross-language information retrieval system. While many investigations have explored the use of query expansion techniques to combat errors induced by translation, no study has yet examined the effectiveness of these techniques across resources of varying quality. This paper presents results using parallel corpora and bilingual wordlists that have been deliberately degraded prior to query translation. Across different languages, translingual resources, and degrees of resource degradation, pre-translation query expansion is tremendously effective. In several instances, pre-translation expansion results in better performance when no translations are available, than when an uncompromised resource is used without pre-translation expansion. We also demonstrate that post-translation expansion using relevance feedback can confer modest performance gains. Measuring the efficacy of these techniques with resources of different quality suggests an explanation for the conflicting reports that have appeared in the literature.