Structural feature selection for English-Korean statistical machine translation

Authors:
Seonho Kim;Juntae Yoon;Mansuk Song
Affiliations:
Yonsei University, Seoul, Korea;Yonsei University, Seoul, Korea;Yonsei University, Seoul, Korea
Venue:
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Year:
2000

Citing 20
Cited 1

A statistical approach to machine translation

Computational Linguistics
Translating collocations for bilingual lexicons: a statistical approach

Computational Linguistics
A maximum entropy approach to natural language processing

Computational Linguistics
Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical methods for speech recognition

Statistical methods for speech recognition
A program for aligning sentences in bilingual corpora

Computational Linguistics - Special issue on using large corpora: I
Text-translation alignment

Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
A DP based search using monotone alignments in statistical translation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Decoding algorithm in statistical machine translation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A word-to-word model of translational equivalence

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Improving statistical natural language translation with categories and rules

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Maximum entropy model learning of the translation rules

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Modeling with structures in statistical machine translation

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Aligning sentences in bilingual corpora using lexical information

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
An algorithm for finding noun phrase correspondences in bilingual corpora

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Structural matching of parallel texts

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
A polynomial-time algorithm for statistical machine translation

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Bilingual knowledge acquisition from Korean-English parallel corpus using alignment method: Korean-English alignment at word and phrase level

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
The Candide system for machine translation

HLT '94 Proceedings of the workshop on Human Language Technology

Selecting target word using contexonym comparison method

Proceedings of the 2007 conference on Human interface: Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

When aligning texts in very different languages such as Korean and English, structural features beyond word or phrase give useful information. In this paper, we present a method for selecting structural features of two languages, from which we construct a model that assigns the conditional probabilities to corresponding tag sequences in bilingual English-Korean corpora. For tag sequence mapping between two languages, we first define a structural feature function which represents statistical properties of empirical distribution of a set of training samples. The system, based on maximum entropy concept, selects only features that produce high increases in loglikelihood of training samples. These structurally mapped features are more informative knowledge for statistical machine translation between English and Korean. Also, the information can help to reduce the parameter space of statistical alignment by eliminating syntactically unlikely alignments.