Combining polish morphosyntactic taggers

  • Authors:
  • Tomasz Ś/niatowski;Maciej Piasecki

  • Affiliations:
  • Institute of Informatics, Wroc$#322/aw University of Technology, Wroc$#322/aw, Poland;Institute of Informatics, Wroc$#322/aw University of Technology, Wroc$#322/aw, Poland

  • Venue:
  • SIIS'11 Proceedings of the 2011 international conference on Security and Intelligent Information Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes work on the construction of a morpho-syntactic tagger for Polish as an ensemble of the best performing Polish taggers: TaKIPI and Pantera. The tagger set was extended with RFTagger trained on the Polish corpus. Several methods of ensemble construction were tested with the best result, in terms of the tagging error reduction, achieved with simple, unweighted voting among the three taggers. Two evaluation metrics were used, namely: weak and strong accuracy. The ensemble-based tagger presented a significant increase in both evaluation metrics, achieving nearly 94% weak correctness. This represents a one percentage point increase over the best individual tagger tested, or an error rate reduction of over 15%.