Recursive alignment block classification technique for word reordering in statistical machine translation

  • Authors:
  • Marta R. Costa-Jussà;José A. Fonollosa;Enric Monte

  • Affiliations:
  • Barcelona Media Innovation Center, Barcelona, Spain 08018;Universitat Politècnica de Catalunya, TALP Research Center, Barcelona, Spain 08034;Universitat Politècnica de Catalunya, TALP Research Center, Barcelona, Spain 08034

  • Venue:
  • Language Resources and Evaluation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Statistical machine translation (SMT) is based on alignment models which learn from bilingual corpora the word correspondences between source and target language. These models are assumed to be capable of learning reorderings. However, the difference in word order between two languages is one of the most important sources of errors in SMT. In this paper, we show that SMT can take advantage of inductive learning in order to solve reordering problems. Given a word alignment, we identify those pairs of consecutive source blocks (sequences of words) whose translation is swapped, i.e. those blocks which, if swapped, generate a correct monotonic translation. Afterwards, we classify these pairs into groups, following recursively a co-occurrence block criterion, in order to infer reorderings. Inside the same group, we allow new internal combination in order to generalize the reorder to unseen pairs of blocks. Then, we identify the pairs of blocks in the source corpora (both training and test) which belong to the same group. We swap them and we use the modified source training corpora to realign and to build the final translation system. We have evaluated our reordering approach both in alignment and translation quality. In addition, we have used two state-of-the-art SMT systems: a Phrased-based and an Ngram-based. Experiments are reported on the EuroParl task, showing improvements almost over 1 point in the standard MT evaluation metrics (mWER and BLEU).