Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Improving accuracy in word class tagging through the combination of machine learning systems
Computational Linguistics
Classifier combination for improved lexical disambiguation
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Serial combination of rules and statistics: a case study in Czech tagging
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Feature-rich part-of-speech tagging with a cyclic dependency network
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
What is at stake: a case study of Russian expressions starting with a preposition
MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing
MorphSlav '03 Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages
Developing an open-source, rule-based proofreading tool
Software—Practice & Experience
Towards the adequate evaluation of morphosyntactic taggers
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Hi-index | 0.00 |
This paper describes work on the construction of a morpho-syntactic tagger for Polish as an ensemble of the best performing Polish taggers: TaKIPI and Pantera. The tagger set was extended with RFTagger trained on the Polish corpus. Several methods of ensemble construction were tested with the best result, in terms of the tagging error reduction, achieved with simple, unweighted voting among the three taggers. Two evaluation metrics were used, namely: weak and strong accuracy. The ensemble-based tagger presented a significant increase in both evaluation metrics, achieving nearly 94% weak correctness. This represents a one percentage point increase over the best individual tagger tested, or an error rate reduction of over 15%.