Learning phrase boundaries for hierarchical phrase-based translation

  • Authors:
  • Zhongjun He;Yao Meng;Hao Yu

  • Affiliations:
  • Fujitsu R&D Center CO., LTD.;Fujitsu R&D Center CO., LTD.;Fujitsu R&D Center CO., LTD.

  • Venue:
  • COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Hierarchical phrase-based models provide a powerful mechanism to capture non-local phrase reorderings for statistical machine translation (SMT). However, many phrase reorderings are arbitrary because the models are weak on determining phrase boundaries for pattern-matching. This paper presents a novel approach to learn phrase boundaries directly from word-aligned corpus without using any syntactical information. We use phrase boundaries, which indicate the beginning/ending of phrase reordering, as soft constraints for decoding. Experimental results and analysis show that the approach yields significant improvements over the baseline on large-scale Chinese-to-English translation.