Chunk-lattices for verb reordering in Arabic---English statistical machine translation

  • Authors:
  • Arianna Bisazza;Daniele Pighin;Marcello Federico

  • Affiliations:
  • Fondazione Bruno Kessler-IRST, Trento, Italy;Fondazione Bruno Kessler-IRST, Trento, Italy and TALP Research Center-Universitat Politécnica de Catalunya, Barcelona, Spain;Fondazione Bruno Kessler-IRST, Trento, Italy

  • Venue:
  • Machine Translation
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Syntactic disfluencies in Arabic-to-English phrase-based SMT output are often due to incorrect verb reordering in Verb---Subject---Object sentences. As a solution, we propose a chunk-based reordering technique to automatically displace clause-initial verbs in the Arabic side of a word-aligned parallel corpus. This method is used to preprocess the training data, and to collect statistics about verb movements. From this analysis we build specific verb reordering lattices on the test sentences before decoding, and test different lattice-weighting schemes. Finally, we train a feature-rich discriminative model to predict likely verb reorderings for a given Arabic sentence. The model scores are used to prune the reordering lattice, leading to better word reordering at decoding time. The application of our reordering methods to the training and test data results in consistent improvements on the NIST-MT 2009 Arabic---English benchmark, both in terms of BLEU (+1.06%) and of reordering quality (+0.85%) measured with the Kendall Reordering Score.