Filtering or adapting: two strategies to exploit noisy parallel corpora for cross-language information retrieval

Authors:
Lixin Shi;Jian-Yun Nie
Affiliations:
Université de Montréal, Centre-ville, Montréal, Québec, Canada;Université de Montréal, Centre-ville, Montréal, Québec, Canada
Venue:
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Year:
2006

Citing 6
Cited 1

Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Embedding web-based statistical translation models in cross-language information retrieval

Computational Linguistics - Special issue on web as corpus
A program for aligning sentences in bilingual corpora

Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Mining the Web for bilingual text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Automatic filtering of bilingual corpora for statistical machine translation

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems

Comparing different units for query translation in Chinese cross-language information retrieval

Proceedings of the 2nd international conference on Scalable information systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Noisy parallel corpora have been widely used for Cross-language information retrieval (CLIR). However, the previous studies only focus on truly parallel corpus. In this paper, we examine two possible approaches to exploit noisy corpora: filtering out noise from the corpora or adapting the training process of translation model to the noise corpora. Our experiments show that the second approach is better suited to CLIR.