A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora
AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Automatic identification of word translations from unrelated English and German corpora
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Translating named entities using monolingual and bilingual resources
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Improved statistical alignment models
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Learning a translation lexicon from monolingual corpora
ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Mining new word translations from comparable corpora
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Identification of confusable drug names: a new approach and evaluation methodology
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Bilingual lexicon generation using non-aligned signatures
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Bilingual lexicon extraction from comparable corpora using in-domain terms
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Building and using comparable corpora for domain-specific bilingual lexicon extraction
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
hrWaC and slWac: compiling web corpora for Croatian and Slovene
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
An approach to acquire word translations from non-parallel texts
EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Hi-index | 0.00 |
In this paper we present an approach to bootstrap a Croatian-Slovene bilingual lexicon from comparable news corpora from scratch, without relying on any external bilingual knowledge resource. Instead of using a dictionary to translate context vectors, we build a seed lexicon from identical words in both languages and extend it with context-based cognates and translation candidates of the most frequent words. By enlarging the seed dictionary for only 7% we were able to improve the baseline precision from 0.597 to 0.731 on the mean reciprocal rank for the ten top-ranking translation candidates with a 50.4% recall on the gold standard of 500 entries.