Improved Chinese--English SMT with Chinese “DE” Construction Classification and Reordering

  • Authors:
  • Jinhua Du;Andy Way

  • Affiliations:
  • Dublin City University;Dublin City University

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Syntactic reordering on the source side has been demonstrated to be helpful and effective for handling different word orders between source and target languages in SMT. In this article, we focus on the Chinese (DE) construction which is flexible and ubiquitous in Chinese and has many different ways to be translated into English so that it is a major source of word order differences in terms of translation quality. This article carries out the Chinese “DE” construction study for Chinese--English SMT in which we propose a new classifier model---discriminative latent variable model (DPLVM)---with new features to improve the classification accuracy and indirectly improve the translation quality compared to a log-linear classifier. The DE classifier is used to recognize DE structures in both training and test sentences of Chinese, and then perform word reordering to make the Chinese sentences better match the word order of English. In order to investigate the impact of the DE classification and reordering in the source side on different types of SMT systems (namely PB-SMT, hierarchical PB-SMT (HPB-SMT) as well as the syntax-based SMT (SAMT)), we conduct a series of experiments on NIST 2005 and 2008 test sets to verify the effectiveness of our proposed model. The experimental results show that the MT systems using the data reordered by our proposed model outperform the baseline systems by 3.01% and 4.03% relative points on the NIST 2005 test set, 4.64% and 4.62% relative points on the NIST 2008 test set in terms of BLEU score for PB-SMT and HPB-SMT respectively. However, the DE classification method does not perform significantly well for SAMT. Additionally, we also conducted some experiments to evaluate our DE classification and reordering approach on the word alignment and phrase table in terms of these three types of SMT systems.