Memory-based morphological analysis generation and part-of-speech tagging of Arabic

  • Authors:
  • Erwin Marsi;Antal van den Bosch;Abdelhadi Soudi

  • Affiliations:
  • Tilburg University, Tilburg, The Netherlands;Tilburg University, Tilburg, The Netherlands;Ecole Nationale de L'Industrie Minérale, Rabat, Morocco

  • Venue:
  • Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We explore the application of memory-based learning to morphological analysis and part-of-speech tagging of written Arabic, based on data from the Arabic Treebank. Morphological analysis -- the construction of all possible analyses of isolated unvoweled wordforms -- is performed as a letter-by-letter operation prediction task, where the operation encodes segmentation, part-of-speech, character changes, and vocalization. Part-of-speech tagging is carried out by a bi-modular tagger that has a subtagger for known words and one for unknown words. We report on the performance of the morphological analyzer and part-of-speech tagger. We observe that the tagger, which has an accuracy of 91.9% on new data, can be used to select the appropriate morphological analysis of words in context at a precision of 64.0 and a recall of 89.7.