Sentence Splitting for Vietnamese-English Machine Translation

  • Authors:
  • Bui Thanh Hung;Nguyen Le Minh;Akira Shimazu

  • Affiliations:
  • -;-;-

  • Venue:
  • KSE '12 Proceedings of the 2012 Fourth International Conference on Knowledge and Systems Engineering
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Translation quality is often disappointed when a phrase based machine translation system deals with long sentences. Because of syntactic structure discrepancy between two languages, the translation output will not preserve the same word order as the source. When a sentence is long, it should be partitioned into several clauses and the word reordering in the translation should be done within clauses, not between clauses. In this paper, a rule-based technique is proposed to split long Vietnamese sentences based on linguistic information. We use splitting boundaries for translating sentences with two type of constrains: wall and zone. This method is useful for preserving word order and improving translation quality. We describe experiments on translation from Vietnamese to English, showing an improvement BLEU and NIST score.