Feature-rich part-of-speech tagging for morphologically complex languages: application to Bulgarian

  • Authors:
  • Georgi Georgiev;Valentin Zhikov;Petya Osenova;Kiril Simov;Preslav Nakov

  • Affiliations:
  • Ontotext AD, Tsarigradsko Sh., Sofia, Bulgaria;Ontotext AD, Tsarigradsko Sh., Sofia, Bulgaria;IICT, Bulgarian Academy of Sciences, Sofia, Bulgaria;IICT, Bulgarian Academy of Sciences, Sofia, Bulgaria;Qatar Computing Research Institute, Qatar Foundation Tornado Tower, Doha, Qatar

  • Venue:
  • EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present experiments with part-of-speech tagging for Bulgarian, a Slavic language with rich inflectional and derivational morphology. Unlike most previous work, which has used a small number of grammatical categories, we work with 680 morpho-syntactic tags. We combine a large morphological lexicon with prior linguistic knowledge and guided learning from a POS-annotated corpus, achieving accuracy of 97.98%, which is a significant improvement over the state-of-the-art for Bulgarian.