A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora
AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Computational Linguistics
Automatic identification of word translations from unrelated English and German corpora
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
An approach based on multilingual thesauri and model combination for bilingual lexicon extraction
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Learning a translation lexicon from monolingual corpora
ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
Leveraging reusability: cost-effective lexical acquisition for large-scale ontology translation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Exploiting comparable corpora for cross-language information retrieval
PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Using comparable corpora to improve the effectiveness of cross-language information retrieval
IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
A language modeling approach for extracting translation knowledge from comparable corpora
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Mining a Persian-English comparable corpus for cross-language information retrieval
Information Processing and Management: an International Journal
Hi-index | 0.00 |
This paper presents an approach to bilingual lexicon extraction from comparable corpora and evaluations on Cross-Language Information Retrieval. We explore a bi-directional extraction of bilingual terminology primarily from comparable corpora. A combined statistics-based and linguistics-based model to select best translation candidates to phrasal translation is proposed. Evaluations using a large test collection for Japanese-English revealed the proposed combination of bi-directional comparable corpora, bilingual dictionaries and transliteration, augmented with linguistics-based pruning to be highly effective in Cross-Language Information Retrieval.