Improved statistical machine translation using monolingually-derived paraphrases

  • Authors:
  • Yuval Marton;Chris Callison-Burch;Philip Resnik

  • Affiliations:
  • University of Maryland, College Park, MD;Johns Hopkins University, Baltimore, MD;University of Maryland, College Park, MD

  • Venue:
  • EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Untranslated words still constitute a major problem for Statistical Machine Translation (SMT), and current SMT systems are limited by the quantity of parallel training texts. Augmenting the training data with paraphrases generated by pivoting through other languages alleviates this problem, especially for the so-called "low density" languages. But pivoting requires additional parallel texts. We address this problem by deriving paraphrases monolingually, using distributional semantic similarity measures, thus providing access to larger training resources, such as comparable and unrelated monolingual corpora. We present what is to our knowledge the first successful integration of a collocational approach to untranslated words with an end-to-end, state of the art SMT system demonstrating significant translation improvements in a low-resource setting.