Improved Chinese--English SMT with Chinese “DE” Construction Classification and Reordering

Authors:
Jinhua Du;Andy Way
Affiliations:
Dublin City University;Dublin City University
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2011

Citing 18
Cited 0

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A systematic comparison of various statistical alignment models

Computational Linguistics
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Is it harder to parse Chinese, or the Chinese Treebank?

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
A hierarchical phrase-based model for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Clause restructuring for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Improving a statistical MT system with automatically learned rewrite patterns

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Sequential labeling with latent variables: an exact inference algorithm and its efficient approximation

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Syntactic reordering integrated with phrase-based SMT

SSST '08 Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation
Disambiguating "DE" for Chinese-English machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Syntax augmented machine translation via chart parsing

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Syntax directed translations and the pushdown assembler

Journal of Computer and System Sciences
A discriminative latent variable-based "DE" classifier for Chinese--English SMT

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Syntactic reordering on the source side has been demonstrated to be helpful and effective for handling different word orders between source and target languages in SMT. In this article, we focus on the Chinese (DE) construction which is flexible and ubiquitous in Chinese and has many different ways to be translated into English so that it is a major source of word order differences in terms of translation quality. This article carries out the Chinese “DE” construction study for Chinese--English SMT in which we propose a new classifier model---discriminative latent variable model (DPLVM)---with new features to improve the classification accuracy and indirectly improve the translation quality compared to a log-linear classifier. The DE classifier is used to recognize DE structures in both training and test sentences of Chinese, and then perform word reordering to make the Chinese sentences better match the word order of English. In order to investigate the impact of the DE classification and reordering in the source side on different types of SMT systems (namely PB-SMT, hierarchical PB-SMT (HPB-SMT) as well as the syntax-based SMT (SAMT)), we conduct a series of experiments on NIST 2005 and 2008 test sets to verify the effectiveness of our proposed model. The experimental results show that the MT systems using the data reordered by our proposed model outperform the baseline systems by 3.01% and 4.03% relative points on the NIST 2005 test set, 4.64% and 4.62% relative points on the NIST 2008 test set in terms of BLEU score for PB-SMT and HPB-SMT respectively. However, the DE classification method does not perform significantly well for SAMT. Additionally, we also conducted some experiments to evaluate our DE classification and reordering approach on the word alignment and phrase table in terms of these three types of SMT systems.