Transformation-based part-of-speech tagging for Serbian language

  • Authors:
  • Vlado Delic;Milan Sečujski;Aleksandar Kupusinac

  • Affiliations:
  • Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia;Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia;Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia

  • Venue:
  • CIMMACS'09 Proceedings of the 8th WSEAS International Conference on Computational intelligence, man-machine systems and cybernetics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Machine learning techniques based on transformation rules have proven to be a viable alternative to stochastic tagging, achieving similar accuracy while having many advantages such as simplicity and better portability to other languages. However, data sparsity remains one of the greatest obstacles to tagging languages with complex morphology. Research in POS tagging for Serbian language described in this paper has resulted in several original ideas for improving tagging accuracy and overcoming problems related to data sparsity for highly inflected languages. The POS tagger for Serbian described in this paper achieves an error rate of 10.0% when trained on a previously annotated text corpus containing 190,000 words, which is comparable with results reported for some other languages with a similar level of inflection.