A projection extension algorithm for statistical machine translation

  • Authors:
  • Christoph Tillmann

  • Affiliations:
  • IBM T.J. Watson Research Center, Yorktown Heights, NY

  • Venue:
  • EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we describe a phrase-based unigram model for statistical machine translation that uses a much simpler set of model parameters than similar phrase-based models. The units of translation are blocks -- pairs of phrases. During decoding, we use a block unigram model and a word-based trigram language model. During training, the blocks are learned from source interval projections using an underlying high-precision word alignment. The system performance is significantly increased by applying a novel block extension algorithm using an additional high-recall word alignment. The blocks are further filtered using unigram-count selection criteria. The system has been successfully test on a Chinese-English and an Arabic-English translation task.