Using comparable corpora to improve the effectiveness of cross-language information retrieval

Authors:
Fatiha Sadat
Affiliations:
University of Quebec in Montreal, Computer Science Department, Montreal, QC, Canada
Venue:
IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Year:
2010

Citing 20
Cited 1

Word sense disambiguation using a second language monolingual corpus

Computational Linguistics
Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings

Information Retrieval
A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Semi-automatic Compilation of Bilingual Lexicon Entries from Cross-Lingually Relevant News Articles on WWW News Sites

AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Cross-language information retrieval: experiments based on CLEF 2000 corpora

Information Processing and Management: an International Journal
Enhancing cross-language information retrieval by an automatic acquisition of bilingual terminology from comparable corpora

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Dictionary-Based Cross-Language Information Retrieval: Learning Experiences from CLEF 2000–2002

Information Retrieval
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Machine transliteration

Computational Linguistics
Extraction of lexical translations from non-aligned corpora

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Effect of cross-language IR in bilingual lexicon acquisition from comparable corpora

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
An approach based on multilingual thesauri and model combination for bilingual lexicon extraction

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Word sense acquisition from bilingual comparable corpora

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Bilingual terminology acquisition from comparable corpora and phrasal translation to cross-language information retrieval

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing
Learning a translation lexicon from monolingual corpora

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Learning bilingual translations from comparable corpora to cross-language information retrieval: hybrid statistics-based and linguistics-based approach

AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
A geometric view on bilingual lexicon extraction from comparable corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

Mining a Persian-English comparable corpus for cross-language information retrieval

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large-scale comparable corpora became more abundant and accessible than parallel corpora, with the explosive growth of the World Wide Web. From the Cross-Language Information Retrieval point of view, limitation of translation resources as well as ambiguity arising due to failure to translate query terms is largely responsible for large drops in the effectiveness below monolingual performance. Therefore, strategies on bilingual terminology extraction from comparable texts must be given more attention in order to enrich existing bilingual lexicons and thesauri and to enhance Cross-Language Information Retrieval. In the present paper, we focus on the enhancement of Cross-Language Information Retrieval using a two-stage corpus-based translation model that includes bi-directional extraction of bilingual terminology from comparable corpora and selection of best translation alternatives on the basis of their morphological knowledge. The impact of comparable corpora on the performance of the Cross-Language Information Retrieval process is evaluated in this study and the results indicate that the effect is clearly positive, especially when using the linear combination with bilingual dictionaries and Japanese-English pair of languages.