C4.5: programs for machine learning
C4.5: programs for machine learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Improving accuracy in word class tagging through the combination of machine learning systems
Computational Linguistics
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Morphological tagging: data vs. dictionaries
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Classifier combination for improved lexical disambiguation
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Accuracy of Baseline and Complex Methods Applied to Morphosyntactic Tagging of Polish
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
MorphSlav '03 Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages
Performance analysis of a part of speech tagging task
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Clustering polish texts with latent semantic analysis
ICAISC'10 Proceedings of the 10th international conference on Artifical intelligence and soft computing: Part II
Hi-index | 0.00 |
We compare the accuracy of several single and combination part-of-speech tagging methods applied to Polish and evaluated on the modified corpus of Frequency Dictionary of Contemporary Polish (m-FDCP). Three well known combination methods (weighted voting, distributed voting, and stacked) are analyzed, as well as two new weighted voting methods: MorphCatPrecision and AmbClassPrecision methods are proposed. The MorphCatPrecision method achieves the highest accuracy among all considered weighted voting methods. The best combination method achieves 11.9% error reduction with respect to the best baseline tagger. We report also the statistical significance of the difference in accuracy between various methods measured by means of the McNemar test. Selection of the best algorithms was conducted on a multiprocessor supercomuter due to the high time and memory requirements of most of these algorithms.