Disambiguating "DE" for Chinese-English machine translation

  • Authors:
  • Pi-Chuan Chang;Dan Jurafsky;Christopher D. Manning

  • Affiliations:
  • Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA

  • Venue:
  • StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Linking constructions involving [Abstract contained text which could not be captured.] (DE) are ubiquitous in Chinese, and can be translated into English in many different ways. This is a major source of machine translation error, even when syntax-sensitive translation models are used. This paper explores how getting more information about the syntactic, semantic, and discourse context of uses of [Abstract contained text which could not be captured.] (DE) can facilitate producing an appropriate English translation strategy. We describe a finer-grained classification of [Abstract contained text which could not be captured.] (DE) constructions in Chinese NPs, construct a corpus of annotated examples, and then train a log-linear classifier, which contains linguistically inspired features. We use the DE classifier to preprocess MT data by explicitly labeling [Abstract contained text which could not be captured.] (DE) constructions, as well as reordering phrases, and show that our approach provides significant BLEU point gains on MT02 (+1.24), MT03 (+0.88) and MT05 (+1.49) on a phrased-based system. The improvement persists when a hierarchical reordering model is applied.