Arabic diacritization using weighted finite-state transducers

  • Authors:
  • Rani Nelken;Stuart M. Shieber

  • Affiliations:
  • Harvard University, Cambridge, MA;Harvard University, Cambridge, MA

  • Venue:
  • Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Arabic is usually written without short vowels and additional diacritics, which are nevertheless important for several applications. We present a novel algorithm for restoring these symbols, using a cascade of probabilistic finite-state transducers trained on the Arabic treebank, integrating a word-based language model, a letter-based language model, and an extremely simple morphological model. This combination of probabilistic methods and simple linguistic information yields high levels of accuracy.