Computer processing of Arabic script-based languages: current state and future directions

  • Authors:
  • Ali Farghaly

  • Affiliations:
  • SYSTRAN Software, Inc., San Diego, CA

  • Venue:
  • Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
  • Year:
  • 2004

Quantified Score

Hi-index 0.01

Visualization

Abstract

Arabic script-based languages do not belong to a single language family, and therefore exhibit different linguistic properties. To name just a few: Arabic is primarily a VSO language whereas Farsi is an SVO and Urdu is an SOV language. Both Farsi and Urdu have light verbs whereas Arabic does not. Urdu and Arabic have grammatical gender while Farsi does not. There are, however, linguistic and non-linguistic factors that bring these languages together. On the linguistic side it is the use of the Arabic script, the right to left direction, the absence of characters representing short vowels and the complex word structure. Non-linguistic common properties that bind the majority of speakers of these languages include: the Qur'an that every Moslem has to recite in Arabic, proximity of the countries speaking these languages, common history and, to a large extent, a common culture and historical influx. It is not surprising, then, that the surge of interest in the study of these languages and the sudden availability for funding to support the development of computational applications to process data in these languages come for all these languages at the same time.