Application of clause alignment for statistical machine translation

  • Authors:
  • Svetla Koeva;Borislav Rizov;Ivelina Stoyanova;Svetlozara Leseva;Rositsa Dekova;Angel Genov;Ekaterina Tarpomanova;Tsvetana Dimitrova;Hristina Kukova

  • Affiliations:
  • Bulgarian Academy of Sciences, Sofia, Bulgaria;Bulgarian Academy of Sciences, Sofia, Bulgaria;Bulgarian Academy of Sciences, Sofia, Bulgaria;Bulgarian Academy of Sciences, Sofia, Bulgaria;Bulgarian Academy of Sciences, Sofia, Bulgaria;Bulgarian Academy of Sciences, Sofia, Bulgaria;Bulgarian Academy of Sciences, Sofia, Bulgaria;Bulgarian Academy of Sciences, Sofia, Bulgaria;Bulgarian Academy of Sciences, Sofia, Bulgaria

  • Venue:
  • SSST-6 '12 Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper presents a new resource light flexible method for clause alignment which combines the Gale-Church algorithm with internally collected textual information. The method does not resort to any pre-developed linguistic resources which makes it very appropriate for resource light clause alignment. We experiment with a combination of the method with the original Gale-Church algorithm (1993) applied for clause alignment. The performance of this flexible method, as it will be referred to hereafter, is measured over a specially designed test corpus. The clause alignment is explored as means to provide improved training data for the purposes of Statistical Machine Translation (SMT). A series of experiments with Moses demonstrate ways to modify the parallel resource and effects on translation quality: (1) baseline training with a Bulgarian-English parallel corpus aligned at sentence level; (2) training based on parallel clause pairs; (3) training with clause reordering, where clauses in each source language (SL) sentence are reordered according to order of the clauses in the target language (TL) sentence. Evaluation is based on BLEU score and shows small improvement when using the clause aligned corpus.