Homonymy and polysemy in information retrieval
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A flexible POS tagger using an automatically acquired language model
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Comparing a linguistic and a stochastic tagger
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A Machine Learning Approach to POS Tagging
Machine Learning
Detecting errors in corpora using support vector machines
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Is 1 noun worth 2 adjectives?: measuring relative feature utility
Information Processing and Management: an International Journal
Detecting errors in discontinuous structural annotation
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Correcting dependency annotation errors
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Syntactic parser combination for improved dependency analysis
ROMAND '04 Proceedings of the 3rd Workshop on RObust Methods in Analysis of Natural Language Data
Consistency checking for Treebank alignment
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Out-of-the-box robust parsing of Portuguese
PROPOR'10 Proceedings of the 9th international conference on Computational Processing of the Portuguese Language
Hi-index | 0.00 |
This paper addresses the issue of POS tagger evaluation. Such evaluation is usually performed by comparing the tagger output with a reference test corpus, which is assumed to be error-free. Currently used corpora contain noise which causes the obtained performance to be a distortion of the real value. We analyze to what extent this distortion may invalidate the comparison between taggers or the measure of the improvement given by a new system. The main conclusion is that a more rigorous testing experimentation setting/designing is needed to reliably evaluate and compare tagger accuracies.