A dependency treelet string correspondence model for statistical machine translation

  • Authors:
  • Deyi Xiong;Qun Liu;Shouxun Lin

  • Affiliations:
  • Chinese Academy of Sciences, Beijing, China;Chinese Academy of Sciences, Beijing, China;Chinese Academy of Sciences, Beijing, China

  • Venue:
  • StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a novel model using dependency structures on the source side for syntax-based statistical machine translation: Dependency Treelet String Correspondence Model (DTSC). The DTSC model maps source dependency structures to target strings. In this model translation pairs of source treelets and target strings with their word alignments are learned automatically from the parsed and aligned corpus. The DTSC model allows source treelets and target strings with variables so that the model can generalize to handle dependency structures with the same head word but with different modifiers and arguments. Additionally, target strings can be also discontinuous by using gaps which are corresponding to the uncovered nodes which are not included in the source treelets. A chart-style decoding algorithm with two basic operations--substituting and attaching--is designed for the DTSC model. We argue that the DTSC model proposed here is capable of lexicalization, generalization, and handling discontinuous phrases which are very desirable for machine translation. We finally evaluate our current implementation of a simplified version of DTSC for statistical machine translation.