Syntax-driven learning of sub-sentential translation equivalents and translation rules from parsed parallel corpora

  • Authors:
  • Alon Lavie;Alok Parlikar;Vamshi Ambati

  • Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • SSST '08 Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a multi-step process for automatically learning reliable sub-sentential syntactic phrases that are translation equivalents of each other and syntactic translation rules between two languages. The input to the process is a corpus of parallel sentences, word-aligned and annotated with phrase-structure parse trees. We first apply a newly developed algorithm for aligning parse-tree nodes between the two parallel trees. Next, we extract all aligned sub-sentential syntactic constituents from the parallel sentences, and create a syntax-based phrase-table. Finally, we treat the node alignments as tree decomposition points and extract from the corpus all possible synchronous parallel tree fragments. These are then converted into synchronous context-free rules. We describe the approach and analyze its application to Chinese-English parallel data.