Morpho-syntactic Arabic preprocessing for Arabic-to-English statistical machine translation

  • Authors:
  • Anas El Isbihani;Shahram Khadivi;Oliver Bender;Hermann Ney

  • Affiliations:
  • RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany

  • Venue:
  • StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Arabic language has far richer systems of inflection and derivation than English which has very little morphology. This morphology difference causes a large gap between the vocabulary sizes in any given parallel training corpus. Segmentation of inflected Arabic words is a way to smooth its highly morphological nature. In this paper, we describe some statistically and linguistically motivated methods for Arabic word segmentation. Then, we show the efficiency of proposed methods on the Arabic-English BTEC and NIST tasks.