A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora
AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Automatic identification of word translations from unrelated English and German corpora
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Effect of cross-language IR in bilingual lexicon acquisition from comparable corpora
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Online large-margin training of dependency parsers
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Finding translations for low-frequency words in comparable corpora
Machine Translation
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
A discriminative candidate generator for string transformations
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Robust measurement and comparison of context similarity for finding translation pairs
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Developing a robust part-of-speech tagger for biomedical text
PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Statistical Extraction and Comparison of Pivot Words for Bilingual Lexicon Extension
ACM Transactions on Asian Language Information Processing (TALIP)
Bilingual lexicon extraction from comparable corpora using label propagation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.00 |
Existing dictionaries may be effectively enlarged by finding the translations of single words, using comparable corpora. The idea is based on the assumption that similar words have similar contexts across multiple languages. However, previous research suggests the use of a simple bag-of-words model to capture the lexical context, or assumes that sufficient context information can be captured by the successor and predecessor of the dependency tree. While the latter may be sufficient for a close language-pair, we observed that the method is insufficient if the languages differ significantly, as is the case for Japanese and English. Given a query word, our proposed method uses a statistical model to extract relevant words, which tend to co-occur in the same sentence; additionally our proposed method uses three statistical models to extract relevant predecessors, successors and siblings in the dependency tree. We then combine the information gained from the four statistical models, and compare this lexical-dependency information across English and Japanese to identify likely translation candidates. Experiments based on openly accessible comparable corpora verify that our proposed method can increase Top 1 accuracy statistically significantly by around 13 percent points to 53%, and Top 20 accuracy to 91%.