Methods for integrating rule-based and statistical systems for Arabic to English machine translation

  • Authors:
  • Rabih Zbib;Michael Kayser;Spyros Matsoukas;John Makhoul;Hazem Nader;Hamdy Soliman;Rami Safadi

  • Affiliations:
  • Massachusetts Institute of Technology, Cambridge, USA 02139;BBN Technologies, Cambridge, USA 02138;BBN Technologies, Cambridge, USA 02138;BBN Technologies, Cambridge, USA 02138;Sakhr Software, Free Zone, Nasr City, Cairo, Egypt 11771;Sakhr Software, Free Zone, Nasr City, Cairo, Egypt 11771;Sakhr Software, Free Zone, Nasr City, Cairo, Egypt 11771

  • Venue:
  • Machine Translation
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This article presents several techniques for integrating information from a rule-based machine translation (RBMT) system into a statistical machine translation (SMT) framework. These techniques are grouped into three parts that correspond to the type of information integrated: the morphological, lexical, and system levels. The first part presents techniques that use information from a rule-based morphological tagger to do morpheme splitting of the Arabic source text. We also compare with the results of using a statistical morphological tagger. In the second part, we present two ways of using Arabic diacritics to improve SMT results, both based on binary decision trees. The third part presents a system combination method that combines the outputs of the RBMT and the SMT systems, leveraging the strength of each. This article shows how language specific information obtained through a deterministic rule-based process can be used to improve SMT, which is mostly language-independent.