Optimizing Chinese word segmentation for machine translation performance

  • Authors:
  • Pi-Chuan Chang;Michel Galley;Christopher D. Manning

  • Affiliations:
  • Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA

  • Venue:
  • StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Previous work has shown that Chinese word segmentation is useful for machine translation to English, yet the way different segmentation strategies affect MT is still poorly understood. In this paper, we demonstrate that optimizing segmentation for an existing segmentation standard does not always yield better MT performance. We find that other factors such as segmentation consistency and granularity of Chinese "words" can be more important for machine translation. Based on these findings, we implement methods inside a conditional random field segmenter that directly optimize segmentation granularity with respect to the MT task, providing an improvement of 0.73 BLEU. We also show that improving segmentation consistency using external lexicon and proper noun features yields a 0.32 BLEU increase.