Extracting Translation Equivalents from Bilingual Comparable Corpora

Authors:
Hiroyuki Kaji
Affiliations:
The author is with Central Research Laboratory, Hitachi, Ltd., Kokubunji-shi, 185-8601 Japan. E-mail: kaji@crl.hitachi.co.jp
Venue:
IEICE - Transactions on Information and Systems
Year:
2005

Citing 0
Cited 4

Translation disambiguation for cross-language information retrieval using context-based translation probability

Journal of Information Science
Is singular value decomposition useful for word similarity extraction?

Language Resources and Evaluation
Expansion of machine translation bilingual dictionaries by using existing dictionaries and thesauruses

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Bilingual lexicon extraction from comparable corpora using label propagation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

An improved method for extracting translation equivalents from bilingual comparable corpora according to contextual similarity was developed. This method has two main features. First, a seed bilingual lexicon---which is used to bridge contexts in different languages---is adapted to the corpora from which translation equivalents are to be extracted. Second, the contextual similarity is evaluated by using a combination of similarity measures defined in opposite directions. An experiment using Wall Street Journal and Nihon Keizai Shimbun corpora, together with the EDR bilingual dictionary, demonstrated the effectiveness of the method; it produced lists of candidate translation equivalents with an accuracy of around 30% for frequently occurring unknown words. The method thus proved to be useful for improving the coverage of a bilingual lexicon.