Gappy phrasal alignment by agreement

  • Authors:
  • Mohit Bansal;Chris Quirk;Robert C. Moore

  • Affiliations:
  • UC Berkeley;Microsoft Research;Google Research

  • Venue:
  • HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a principled and efficient phrase-to-phrase alignment model, useful in machine translation as well as other related natural language processing problems. In a hidden semi-Markov model, word-to-phrase and phrase-to-word translations are modeled directly by the system. Agreement between two directional models encourages the selection of parsimonious phrasal alignments, avoiding the overfitting commonly encountered in unsupervised training with multi-word units. Expanding the state space to include "gappy phrases" (such as French ne * pas) makes the alignment space more symmetric; thus, it allows agreement between discontinuous alignments. The resulting system shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models, while maintaining asymptotically equivalent runtime.