Bilingual knowledge acquisition from Korean-English parallel corpus using alignment method: Korean-English alignment at word and phrase level

Authors:
Jung H. Shin;Young S. Han;Key-Sun Choi
Affiliations:
Korean Advanced Institute of Science and Technology, Taejon, Korea;Suwon University, Kyungki, Korea;Korean Advanced Institute of Science and Technology, Taejon, Korea
Venue:
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Year:
1996

Citing 4
Cited 4

A maximum entropy approach to natural language processing

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Word-sense disambiguation using statistical methods

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
The BICORD system: combining lexical information from bilingual corpora and machine readable dictionaries

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3

Knowledge Extraction from Bilingual Corpora

Information Extraction: Towards Scalable, Adaptable Systems
Structural feature selection for English-Korean statistical machine translation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Chinese-Korean word alignment based on linguistic comparison

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Mining bilingual data from the web with adaptively learnt patterns

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper suggests a method to align Korean-English parallel corpus. The structural dissimilarity between Korean and Indo-European languages requires more flexible measures to evaluate the alignment candidates between the bilingual units than is used to handle the pairs of Indo-European languages. The flexible measure is intended to capture the dependency between bilingual items that can occur in different units according to different ordering rules. The proposed method to accomplish Korean English alignment takes phrases as an alignment unit that is a departure from the existing methods taking words as the unit. Phrasal alignment avoids the problem of alignment units and appease the problem of ordering mismatch. The parameters are estimated using the EM algorithm. The proposed alignment algorithm is based on dynamic programming. In the experiments carried out on 253,000 English words and its Korean translations the proposed method achived 68.7% in accuracy at phrase level and 89.2% in accuracy with the bilingual dictionary induced from the alignment. The result of the alignment may lead to richer bilingual data than can be derived from only word level alignments.