Noisy SMS machine translation in low-density languages

  • Authors:
  • Vladimir Eidelman;Kristy Hollingshead;Philip Resnik

  • Affiliations:
  • University of Maryland, College Park;University of Maryland, College Park;University of Maryland, College Park

  • Venue:
  • WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the system we developed for the 2011 WMT Haitian Creole--English SMS featured translation task. Applying standard statistical machine translation methods to noisy real-world SMS data in a low-density language setting such as Haitian Creole poses a unique set of challenges, which we attempt to address in this work. Along with techniques to better exploit the limited available training data, we explore the benefits of several methods for alleviating the additional noise inherent in the SMS and transforming it to better suite the assumptions of our hierarchical phrase-based model system. We show that these methods lead to significant improvements in BLEU score over the baseline.