A hybrid approach for building Arabic diacritizer

  • Authors:
  • Khaled Shaalan;Hitham M. Abo Bakr;Ibrahim Ziedan

  • Affiliations:
  • The British University in Dubai;Zagazig University;Zagazig University

  • Venue:
  • Semitic '09 Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern standard Arabic is usually written without diacritics. This makes it difficult for performing Arabic text processing. Diacritization helps clarify the meaning of words and disambiguate any vague spellings or pronunciations, as some Arabic words are spelled the same but differ in meaning. In this paper, we address the issue of adding diacritics to undiacritized Arabic text using a hybrid approach. The approach requires an Arabic lexicon and large corpus of fully diacritized text for training purposes in order to detect diacritics. Case-Ending is treated as a separate post processing task using syntactic information. The hybrid approach relies on lexicon retrieval, bigram, and SVM-statistical prioritized techniques. We present results of an evaluation of the proposed diacritization approach and discuss various modifications for improving the performance of this approach.