Explorations in Automatic Thesaurus Discovery
Explorations in Automatic Thesaurus Discovery
A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora
AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
Automatic identification of word translations from unrelated English and German corpora
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Improved statistical alignment models
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Optimization of word alignment clues
Natural Language Engineering
Learning a translation lexicon from monolingual corpora
ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Mining new word translations from comparable corpora
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Artificial Intelligence in Medicine
Bilingual lexicon generation using non-aligned signatures
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Automatic analysis of semantic similarity in comparable text through syntactic tree matching
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
hrWaC and slWac: compiling web corpora for Croatian and Slovene
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Bootstrapping bilingual lexicons from comparable corpora for closely related languages
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Bilingual lexicon extraction from comparable corpora using label propagation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Using domain-specific and collaborative resources for term translation
SSST-6 '12 Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation
Hi-index | 0.00 |
This paper presents a series of experiments aimed at inducing and evaluating domain-specific bilingual lexica from comparable corpora. First, a small English-Slovene comparable corpus from health magazines was manually constructed and then used to compile a large comparable corpus on health-related topics from web corpora. Next, a bilingual lexicon for the domain was extracted from the corpus by comparing context vectors in the two languages. Evaluation of the results shows that a 2-way translation of context vectors significantly improves precision of the extracted translation equivalents. We also show that it is sufficient to increase the corpus for one language in order to obtain a higher recall, and that the increase of the number of new words is linear in the size of the corpus. Finally, we demonstrate that by lowering the frequency threshold for context vectors, the drop in precision is much slower than the increase of recall.