Enhancing statistical machine translation with character alignment

  • Authors:
  • Ning Xi;Guangchao Tang;Xinyu Dai;Shujian Huang;Jiajun Chen

  • Affiliations:
  • Nanjing University, Nanjing, China;Nanjing University, Nanjing, China;Nanjing University, Nanjing, China;Nanjing University, Nanjing, China;Nanjing University, Nanjing, China

  • Venue:
  • ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The dominant practice of statistical machine translation (SMT) uses the same Chinese word segmentation specification in both alignment and translation rule induction steps in building Chinese-English SMT system, which may suffer from a suboptimal problem that word segmentation better for alignment is not necessarily better for translation. To tackle this, we propose a framework that uses two different segmentation specifications for alignment and translation respectively: we use Chinese character as the basic unit for alignment, and then convert this alignment to conventional word alignment for translation rule induction. Experimentally, our approach outperformed two baselines: fully word-based system (using word for both alignment and translation) and fully character-based system, in terms of alignment quality and translation performance.