POS tagging of dialectal Arabic: a minimally supervised approach

  • Authors:
  • Kevin Duh;Katrin Kirchhoff

  • Affiliations:
  • University of Washington, Seattle, WA;University of Washington, Seattle, WA

  • Venue:
  • Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Natural language processing technology for the dialects of Arabic is still in its infancy, due to the problem of obtaining large amounts of text data for spoken Arabic. In this paper we describe the development of a part-of-speech (POS) tagger for Egyptian Colloquial Arabic. We adopt a minimally supervised approach that only requires raw text data from several varieties of Arabic and a morphological analyzer for Modern Standard Arabic. No dialect-specific tools are used. We present several statistical modeling and cross-dialectal data sharing techniques to enhance the performance of the baseline tagger and compare the results to those obtained by a supervised tagger trained on hand-annotated data and, by a state-of-the-art Modern Standard Arabic tagger applied to Egyptian Arabic.