Word association norms, mutual information, and lexicography
Computational Linguistics
A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora
AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Construction of a bilingual dictionary intermediated by a third language
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Extracting word correspondences from bilingual corpora based on word co-occurrences information
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Extraction of lexical translations from non-aligned corpora
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Automatic identification of word translations from unrelated English and German corpora
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Learning a translation lexicon from monolingual corpora
ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Inducing translation lexicons via diverse similarity measures and bridge languages
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Finding translations for low-frequency words in comparable corpora
Machine Translation
Statistical machine translation
ACM Computing Surveys (CSUR)
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Automatic generation of translation dictionaries using intermediary languages
CrossLangInduction '06 Proceedings of the International Workshop on Cross-Language Knowledge Induction
Semitic '09 Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages
Compiling a massive, multilingual dictionary via probabilistic inference
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Automatic construction of a transfer dictionary considering directionality
MLR '04 Proceedings of the Workshop on Multilingual Linguistic Ressources
Clustering comparable corpora for bilingual lexicon extraction
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Building and using comparable corpora for domain-specific bilingual lexicon extraction
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Bootstrapping bilingual lexicons from comparable corpora for closely related languages
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Analyzing methods for improving precision of pivot based bilingual dictionaries
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Detecting highly confident word translations from comparable corpora without any prior knowledge
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
A Fast and Accurate Method for Bilingual Opinion Lexicon Extraction
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Hi-index | 0.00 |
Bilingual lexicons are fundamental resources. Modern automated lexicon generation methods usually require parallel corpora, which are not available for most language pairs. Lexicons can be generated using non-parallel corpora or a pivot language, but such lexicons are noisy. We present an algorithm for generating a high quality lexicon from a noisy one, which only requires an independent corpus for each language. Our algorithm introduces non-aligned signatures (NAS), a cross-lingual word context similarity score that avoids the over-constrained and inefficient nature of alignment-based methods. We use NAS to eliminate incorrect translations from the generated lexicon. We evaluate our method by improving the quality of noisy Spanish-Hebrew lexicons generated from two pivot English lexicons. Our algorithm substantially outperforms other lexicon generation methods.