A data mining approach to learn reorder rules for SMT

  • Authors:
  • P. V. S. Avinesh

  • Affiliations:
  • IIIT Hyderabad, Language Technologies Research Centre

  • Venue:
  • HLT-SRWS '10 Proceedings of the NAACL HLT 2010 Student Research Workshop
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we describe a syntax based source side reordering method for phrase-based statistical machine translation (SMT) systems. The source side training corpus is first parsed, then reordering rules are automatically learnt from source-side phrases and word alignments. Later the source side training and test corpus are reordered and given to the SMT system. Reordering is a common problem observed in language pairs of distant language origins. This paper describes an automated approach for learning reorder rules from a word-aligned parallel corpus using association rule mining. Reordered and generalized rules are the most significant in our approach. Our experiments were conducted on an English-Hindi EILMT corpus.