Generative models of monolingual and bilingual gappy patterns

  • Authors:
  • Kevin Gimpel;Noah A. Smith

  • Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

A growing body of machine translation research aims to exploit lexical patterns (e.g., n-grams and phrase pairs) with gaps (Simard et al., 2005; Chiang, 2005; Xiong et al., 2011). Typically, these "gappy patterns" are discovered using heuristics based on word alignments or local statistics such as mutual information. In this paper, we develop generative models of monolingual and parallel text that build sentences using gappy patterns of arbitrary length and with arbitrarily many gaps. We exploit Bayesian nonparametrics and collapsed Gibbs sampling to discover salient patterns in a corpus. We evaluate the patterns qualitatively and also add them as features to an MT system, reporting promising preliminary results.