Enhancing cross-language information retrieval by an automatic acquisition of bilingual terminology from comparable corpora

Authors:
Fatiha Sadat;Masatoshi Yoshikawa;Shunsuke Uemura
Affiliations:
Nara Institute of Science and Technology, Ikoma, Nara, Japan;Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Japan;Nara Institute of Science and Technology, Ikoma, Nara, Japan
Venue:
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Year:
2003

Citing 5
Cited 6

A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Machine transliteration

Computational Linguistics
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
An approach based on multilingual thesauri and model combination for bilingual lexicon extraction

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Learning a translation lexicon from monolingual corpora

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9

Learning bilingual translations from comparable corpora to cross-language information retrieval: hybrid statistics-based and linguistics-based approach

AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
Leveraging reusability: cost-effective lexical acquisition for large-scale ontology translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Exploiting comparable corpora for cross-language information retrieval

PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Using comparable corpora to improve the effectiveness of cross-language information retrieval

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
A language modeling approach for extracting translation knowledge from comparable corpora

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Mining a Persian-English comparable corpus for cross-language information retrieval

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an approach to bilingual lexicon extraction from comparable corpora and evaluations on Cross-Language Information Retrieval. We explore a bi-directional extraction of bilingual terminology primarily from comparable corpora. A combined statistics-based and linguistics-based model to select best translation candidates to phrasal translation is proposed. Evaluations using a large test collection for Japanese-English revealed the proposed combination of bi-directional comparable corpora, bilingual dictionaries and transliteration, augmented with linguistics-based pruning to be highly effective in Cross-Language Information Retrieval.