Learning bilingual translations from comparable corpora to cross-language information retrieval: hybrid statistics-based and linguistics-based approach

Authors:
Fatiha Sadat;Masatoshi Yoshikawa;Shunsuke Uemura
Affiliations:
Nara Institute of Science and Technology, Ikoma-shi Nara, Japan;Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Japan;Nara Institute of Science and Technology, Ikoma-shi, Nara, Japan
Venue:
AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
Year:
2003

Citing 14
Cited 9

Using statistical testing in the evaluation of retrieval experiments

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Word sense disambiguation using a second language monolingual corpus

Computational Linguistics
Automatic routing and retrieval using Smart: TREC-2

TREC-2 Proceedings of the second conference on Text retrieval conference
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Cross-language information retrieval: experiments based on CLEF 2000 corpora

Information Processing and Management: an International Journal
Enhancing cross-language information retrieval by an automatic acquisition of bilingual terminology from comparable corpora

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Machine transliteration

Computational Linguistics
Extraction of lexical translations from non-aligned corpora

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
An approach based on multilingual thesauri and model combination for bilingual lexicon extraction

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Bilingual terminology acquisition from comparable corpora and phrasal translation to cross-language information retrieval

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing
Learning a translation lexicon from monolingual corpora

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9

Stemming to improve translation lexicon creation form bitexts

Information Processing and Management: an International Journal
Automatic extraction of bilingual word pairs using inductive chain learning in various languages

Information Processing and Management: an International Journal
Translation disambiguation for cross-language information retrieval using context-based translation probability

Journal of Information Science
MARS: multilingual access and retrieval system with enhanced query translation and document retrieval

ACLDemos '09 Proceedings of the ACL-IJCNLP 2009 Software Demonstrations
Brains, not brawn: The use of “smart” comparable corpora in bilingual terminology mining

ACM Transactions on Speech and Language Processing (TSLP)
Exploiting comparable corpora for cross-language information retrieval

PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Using comparable corpora to improve the effectiveness of cross-language information retrieval

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
EM-based hybrid model for bilingual terminology extraction from comparable corpora

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Semantic relations in bilingual lexicons

ACM Transactions on Speech and Language Processing (TSLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent years saw an increased interest in the use and the construction of large corpora. With this increased interest and awareness has come an expansion in the application to knowledge acquisition and bilingual terminology extraction. The present paper will seek to present an approach to bilingual lexicon extraction from non-aligned comparable corpora, combination to linguistics-based pruning and evaluations on Cross-Language Information Retrieval. We propose and explore a two-stages translation model for the acquisition of bilingual terminology from comparable corpora, disambiguation and selection of best translation alternatives on the basis of their morphological knowledge. Evaluations using a large-scale test collection on Japanese-English and different weighting schemes of SMART retrieval system confirmed the effectiveness of the proposed combination of two-stages comparable corpora and linguistics-based pruning on Cross-Language Information Retrieval.