Tagging Urdu text with parts of speech: a tagger comparison

  • Authors:
  • Hassan Sajjad;Helmut Schmid

  • Affiliations:
  • Universität Stuttgart, Stuttgart, Germany;Universität Stuttgart, Stuttgart, Germany

  • Venue:
  • EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, four state-of-art probabilistic taggers i.e. TnT tagger, TreeTagger, RF tagger and SVM tool, are applied to the Urdu language. For the purpose of the experiment, a syntactic tagset is proposed. A training corpus of 100,000 tokens is used to train the models. Using the lexicon extracted from the training corpus, SVM tool shows the best accuracy of 94.15%. After providing a separate lexicon of 70,568 types, SVM tool again shows the best accuracy of 95.66%.