Dialectal to standard Arabic paraphrasing to improve Arabic-English statistical machine translation

  • Authors:
  • Wael Salloum;Nizar Habash

  • Affiliations:
  • Columbia University;Columbia University

  • Venue:
  • DIALECTS '11 Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper is about improving the quality of Arabic-English statistical machine translation (SMT) on dialectal Arabic text using morphological knowledge. We present a light-weight rule-based approach to producing Modern Standard Arabic (MSA) paraphrases of dialectal Arabic out-of-vocabulary (OOV) words and low frequency words. Our approach extends an existing MSA analyzer with a small number of morphological clitics, and uses transfer rules to generate paraphrase lattices that are input to a state-of-the-art phrase-based SMT system. This approach improves BLEU scores on a blind test set by 0.56 absolute BLEU (or 1.5% relative). A manual error analysis of translated dialectal words shows that our system produces correct translations in 74% of the time for OOVs and 60% of the time for low frequency words.