Filtering or adapting: two strategies to exploit noisy parallel corpora for cross-language information retrieval

  • Authors:
  • Lixin Shi;Jian-Yun Nie

  • Affiliations:
  • Université de Montréal, Centre-ville, Montréal, Québec, Canada;Université de Montréal, Centre-ville, Montréal, Québec, Canada

  • Venue:
  • CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Noisy parallel corpora have been widely used for Cross-language information retrieval (CLIR). However, the previous studies only focus on truly parallel corpus. In this paper, we examine two possible approaches to exploit noisy corpora: filtering out noise from the corpora or adapting the training process of translation model to the noise corpora. Our experiments show that the second approach is better suited to CLIR.