A Maximum-Entropy Segmentation Model for Statistical Machine Translation

  • Authors:
  • Deyi Xiong; Min Zhang; Haizhou Li

  • Affiliations:
  • Dept. of Human Language Technol., Inst. for Infocomm Res., Singapore, Singapore;-;-

  • Venue:
  • IEEE Transactions on Audio, Speech, and Language Processing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Segmentation is of great importance to statistical machine translation. It splits a source sentence into sequences of translatable segments. We propose a maximum-entropy segmentation model to capture desirable phrasal and hierarchical segmentations for statistical machine translation. We present an approach to automatically learning the beginning and ending boundaries of cohesive segments from word-aligned bilingual data without using any additional resources. The learned boundaries are then used to define cohesive segments in both phrasal and hierarchical segmentations. We integrate the segmentation model into phrasal statistical machine translation (SMT) and conduct experiments on the newswire and broadcast news domain to investigate the effectiveness of the proposed segmentation model on a large-scale training data. Our experimental results show that the maximum-entropy segmentation model significantly improves translation quality in terms of BLEU. We further validate that 1) the proposed segmentation model significantly outperforms syntactic constraints which are used in previous work to constrain segmentations; and 2) it is necessary to capture hierarchical segmentations besides phrasal segmentations.