Transformation-based part-of-speech tagging for Serbian language

Authors:
Vlado Delic;Milan Sečujski;Aleksandar Kupusinac
Affiliations:
Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia;Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia;Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia
Venue:
CIMMACS'09 Proceedings of the 8th WSEAS International Conference on Computational intelligence, man-machine systems and cybernetics
Year:
2009

Citing 5
Cited 0

Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Morphological tagging: data vs. dictionaries

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Machine learning techniques based on transformation rules have proven to be a viable alternative to stochastic tagging, achieving similar accuracy while having many advantages such as simplicity and better portability to other languages. However, data sparsity remains one of the greatest obstacles to tagging languages with complex morphology. Research in POS tagging for Serbian language described in this paper has resulted in several original ideas for improving tagging accuracy and overcoming problems related to data sparsity for highly inflected languages. The POS tagger for Serbian described in this paper achieves an error rate of 10.0% when trained on a previously annotated text corpus containing 190,000 words, which is comparable with results reported for some other languages with a similar level of inflection.