Integration of an Arabic transliteration module into a statistical machine translation system

  • Authors:
  • Mehdi M. Kashani;Eric Joanis;Roland Kuhn;George Foster;Fred Popowich

  • Affiliations:
  • Simon Fraser University, Burnaby, BC, Canada;NRC Institute for Information Technology, Gatineau, QC, Canada;NRC Institute for Information Technology, Gatineau, QC, Canada;NRC Institute for Information Technology, Gatineau, QC, Canada;Simon Fraser University, Burnaby, BC, Canada

  • Venue:
  • StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We provide an in-depth analysis of the integration of an Arabic-to-English transliteration system into a general-purpose phrase-based statistical machine translation system. We study the integration from different aspects and evaluate the improvement that can be attributed to the integration using the BLEU metric. Our experiments show that a transliteration module can help significantly in the situation where the test data is rich with previously unseen named entities. We obtain 70% and 53% of the theoretical maximum improvement we could achieve, as measured by an oracle on development and test sets respectively for OOV words (out of vocabulary source words not appearing in the phrase table).