CCG syntactic reordering models for phrase-based machine translation

  • Authors:
  • Dennis N. Mehay;Chris Brew

  • Affiliations:
  • The Ohio State University, Columbus, OH;Educational Testing Service, Princeton, NJ

  • Venue:
  • WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Statistical phrase-based machine translation requires no linguistic information beyond word-aligned parallel corpora (Zens et al., 2002; Koehn et al., 2003). Unfortunately, this linguistic agnosticism often produces ungrammatical translations. Syntax, or sentence structure, could provide guidance to phrase-based systems, but the "non-constituent" word strings that phrase-based decoders manipulate complicate the use of most recursive syntactic tools. We address these issues by using Combinatory Categorial Grammar, or CCG, (Steedman, 2000), which has a much more flexible notion of constituency, thereby providing more labels for putative non-constituent multiword translation phrases. Using CCG parse charts, we train a syntactic analogue of a lexicalized reordering model by labelling phrase table entries with multiword labels and demonstrate significant improvements in translating between Urdu and English, two language pairs with divergent sentence structure.