Automatic category label coarsening for syntax-based machine translation

  • Authors:
  • Greg Hanneman;Alon Lavie

  • Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider SCFG-based MT systems that get syntactic category labels from parsing both the source and target sides of parallel training data. The resulting joint nonterminals often lead to needlessly large label sets that are not optimized for an MT scenario. This paper presents a method of iteratively coarsening a label set for a particular language pair and training corpus. We apply this label collapsing on Chinese--English and French--English grammars, obtaining test-set improvements of up to 2.8 BLEU, 5.2 TER, and 0.9 METEOR on Chinese--English translation. An analysis of label collapsing's effect on the grammar and the decoding process is also given.