Application of stacked methods to part-of-speech tagging of polish

Authors:
Marcin Kuta;Wojciech Wójcik;Michał Wrzeszcz;Jacek Kitowski
Affiliations:
Institute of Computer Science, AGH-UST, Kraków, Poland;Institute of Computer Science, AGH-UST, Kraków, Poland;Institute of Computer Science, AGH-UST, Kraków, Poland;Institute of Computer Science, AGH-UST, Kraków, Poland
Venue:
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Year:
2009

Citing 11
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Improving accuracy in word class tagging through the combination of machine learning systems

Computational Linguistics
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Morphological tagging: data vs. dictionaries

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Classifier combination for improved lexical disambiguation

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Accuracy of Baseline and Complex Methods Applied to Morphosyntactic Tagging of Polish

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
A flexemic tagset for Polish

MorphSlav '03 Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages
Performance analysis of a part of speech tagging task

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing

Clustering polish texts with latent semantic analysis

ICAISC'10 Proceedings of the 10th international conference on Artifical intelligence and soft computing: Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

We compare the accuracy of several single and combination part-of-speech tagging methods applied to Polish and evaluated on the modified corpus of Frequency Dictionary of Contemporary Polish (m-FDCP). Three well known combination methods (weighted voting, distributed voting, and stacked) are analyzed, as well as two new weighted voting methods: MorphCatPrecision and AmbClassPrecision methods are proposed. The MorphCatPrecision method achieves the highest accuracy among all considered weighted voting methods. The best combination method achieves 11.9% error reduction with respect to the best baseline tagger. We report also the statistical significance of the difference in accuracy between various methods measured by means of the McNemar test. Selection of the best algorithms was conducted on a multiprocessor supercomuter due to the high time and memory requirements of most of these algorithms.