Combining polish morphosyntactic taggers

Authors:
Tomasz Ś/niatowski;Maciej Piasecki
Affiliations:
Institute of Informatics, Wroc$#322/aw University of Technology, Wroc$#322/aw, Poland;Institute of Informatics, Wroc$#322/aw University of Technology, Wroc$#322/aw, Poland
Venue:
SIIS'11 Proceedings of the 2011 international conference on Security and Intelligent Information Systems
Year:
2011

Citing 10
Cited 0

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Improving accuracy in word class tagging through the combination of machine learning systems

Computational Linguistics
Classifier combination for improved lexical disambiguation

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Serial combination of rules and statistics: a case study in Czech tagging

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Estimation of conditional probabilities with decision trees and an application to fine-grained POS tagging

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
What is at stake: a case study of Russian expressions starting with a preposition

MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing
A flexemic tagset for Polish

MorphSlav '03 Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages
Developing an open-source, rule-based proofreading tool

Software—Practice & Experience
Towards the adequate evaluation of morphosyntactic taggers

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes work on the construction of a morpho-syntactic tagger for Polish as an ensemble of the best performing Polish taggers: TaKIPI and Pantera. The tagger set was extended with RFTagger trained on the Polish corpus. Several methods of ensemble construction were tested with the best result, in terms of the tagging error reduction, achieved with simple, unweighted voting among the three taggers. Two evaluation metrics were used, namely: weak and strong accuracy. The ensemble-based tagger presented a significant increase in both evaluation metrics, achieving nearly 94% weak correctness. This represents a one percentage point increase over the best individual tagger tested, or an error rate reduction of over 15%.