Learning method for automatic acquisition of translation knowledge

Authors:
Hiroshi Echizen-ya;Kenji Araki;Yoshio Momouchi
Affiliations:
Dept. of Electronics and Information, Hokkai-Gakuen University, Sapporo, Japan;Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan;Dept. of Electronics and Information, Hokkai-Gakuen University, Sapporo, Japan
Venue:
KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
Year:
2005

Citing 6
Cited 2

Translating collocations for bilingual lexicons: a statistical approach

Computational Linguistics
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Extracting word correspondences from bilingual corpora based on word co-occurrences information

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Study of practical effectiveness for machine translation using recursive chain-link-type learning

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Integrating cross-lingually relevant news articles and monolingual web documents in bilingual lexicon acquisition

COLING '04 Proceedings of the 20th international conference on Computational Linguistics

Translation disambiguation for cross-language information retrieval using context-based translation probability

Journal of Information Science
Part-of-speech tagger for Ainu language based on higher order Hidden Markov Model

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new learning method for automatic acquisition of translation knowledge from parallel corpora. We apply this learning method to automatic extraction of bilingual word pairs from parallel corpora. In general, similarity measures are used to extract bilingual word pairs from parallel corpora. However, similarity measures are insufficient because of the sparse data problem. The essence of our learning method is this presumption: in local parts of bilingual sentence pairs, the equivalents of words that adjoin the source language words of bilingual word pairs also adjoin the target language words of bilingual word pairs. Such adjacent information is acquired automatically in our method. We applied our method to systems based on various similarity measures, thereby confirming the effectiveness of our method.