Exploiting comparable corpora for cross-language information retrieval

Authors:
Fatiha Sadat
Affiliations:
University of Quebec in Montreal, Computer Science Department, Montreal, QC, Canada
Venue:
PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Year:
2010

Citing 7
Cited 1

Enhancing cross-language information retrieval by an automatic acquisition of bilingual terminology from comparable corpora

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Machine transliteration

Computational Linguistics
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
An approach based on multilingual thesauri and model combination for bilingual lexicon extraction

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Bilingual terminology acquisition from comparable corpora and phrasal translation to cross-language information retrieval

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing
Learning bilingual translations from comparable corpora to cross-language information retrieval: hybrid statistics-based and linguistics-based approach

AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11

Mining a Persian-English comparable corpus for cross-language information retrieval

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

Large-scale comparable corpora became more abundant and accessible than parallel corpora, with the explosive growth of the World Wide Web. Therefore, strategies on bilingual terminology extraction from comparable texts must be given more attention in order to enrich existing bilingual lexicons and thesauri and to enhance Cross-Language Information Retrieval. In the present paper, we focus on the enhancement of Cross-Language Information Retrieval using a two-stage corpus-based translation model that includes bi-directional extraction of bilingual terminology from comparable corpora and selection of best translation alternatives on the basis of their morphological knowledge. The impact of comparable corpora on the performance of the Cross-Language Information Retrieval process is evaluated in this study and the results indicate that the effect is clearly positive, especially when using the linear combination with bilingual dictionaries and Japanese-English pair of languages.