Exploiting comparable corpora for cross-language information retrieval

  • Authors:
  • Fatiha Sadat

  • Affiliations:
  • University of Quebec in Montreal, Computer Science Department, Montreal, QC, Canada

  • Venue:
  • PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Large-scale comparable corpora became more abundant and accessible than parallel corpora, with the explosive growth of the World Wide Web. Therefore, strategies on bilingual terminology extraction from comparable texts must be given more attention in order to enrich existing bilingual lexicons and thesauri and to enhance Cross-Language Information Retrieval. In the present paper, we focus on the enhancement of Cross-Language Information Retrieval using a two-stage corpus-based translation model that includes bi-directional extraction of bilingual terminology from comparable corpora and selection of best translation alternatives on the basis of their morphological knowledge. The impact of comparable corpora on the performance of the Cross-Language Information Retrieval process is evaluated in this study and the results indicate that the effect is clearly positive, especially when using the linear combination with bilingual dictionaries and Japanese-English pair of languages.