Divide and translate: improving long distance reordering in statistical machine translation

  • Authors:
  • Katsuhito Sudoh;Kevin Duh;Hajime Tsukada;Tsutomu Hirao;Masaaki Nagata

  • Affiliations:
  • NTT Communication Science Laboratories, Soraku-gun, Kyoto, Japan;NTT Communication Science Laboratories, Soraku-gun, Kyoto, Japan;NTT Communication Science Laboratories, Soraku-gun, Kyoto, Japan;NTT Communication Science Laboratories, Soraku-gun, Kyoto, Japan;NTT Communication Science Laboratories, Soraku-gun, Kyoto, Japan

  • Venue:
  • WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a novel method for long distance, clause-level reordering in statistical machine translation (SMT). The proposed method separately translates clauses in the source sentence and reconstructs the target sentence using the clause translations with non-terminals. The non-terminals are placeholders of embedded clauses, by which we reduce complicated clause-level reordering into simple word-level reordering. Its translation model is trained using a bilingual corpus with clause-level alignment, which can be automatically annotated by our alignment algorithm with a syntactic parser in the source language. We achieved significant improvements of 1.4% in BLEU and 1.3% in TER by using Moses, and 2.2% in BLEU and 3.5% in TER by using our hierarchical phrase-based SMT, for the English-to-Japanese translation of research paper abstracts in the medical domain.