A tree-to-string phrase-based model for statistical machine translation

  • Authors:
  • Thai Phuong Nguyen;Akira Shimazu;Tu-Bao Ho;Minh Le Nguyen;Vinh Van Nguyen

  • Affiliations:
  • Vietnam National University, Hanoi;Japan Advanced Institute of Science and Technology;Japan Advanced Institute of Science and Technology;Japan Advanced Institute of Science and Technology;Japan Advanced Institute of Science and Technology

  • Venue:
  • CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Though phrase-based SMT has achieved high translation quality, it still lacks of generalization ability to capture word order differences between languages. In this paper we describe a general method for tree-to-string phrase-based SMT. We study how syntactic transformation is incorporated into phrase-based SMT and its effectiveness. We design syntactic transformation models using unlexicalized form of synchronous context-free grammars. These models can be learned from source-parsed bitext. Our system can naturally make use of both constituent and non-constituent phrasal translations in the decoding phase. We considered various levels of syntactic analysis ranging from chunking to full parsing. Our experimental results of English-Japanese and English-Vietnamese translation showed a significant improvement over two baseline phrase-based SMT systems.