On the evaluation and comparison of taggers: the effect of noise in testing corpora

Authors:
Lluís Padró;Lluís Màrquez
Affiliations:
Technical University of Catalonia, Barcelona;Technical University of Catalonia, Barcelona
Venue:
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Year:
1998

Citing 3
Cited 8

Homonymy and polysemy in information retrieval

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A flexible POS tagger using an automatically acquired language model

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Comparing a linguistic and a stochastic tagger

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics

A Machine Learning Approach to POS Tagging

Machine Learning
Detecting errors in corpora using support vector machines

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Is 1 noun worth 2 adjectives?: measuring relative feature utility

Information Processing and Management: an International Journal
Detecting errors in discontinuous structural annotation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Correcting dependency annotation errors

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Syntactic parser combination for improved dependency analysis

ROMAND '04 Proceedings of the 3rd Workshop on RObust Methods in Analysis of Natural Language Data
Consistency checking for Treebank alignment

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Out-of-the-box robust parsing of Portuguese

PROPOR'10 Proceedings of the 9th international conference on Computational Processing of the Portuguese Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the issue of POS tagger evaluation. Such evaluation is usually performed by comparing the tagger output with a reference test corpus, which is assumed to be error-free. Currently used corpora contain noise which causes the obtained performance to be a distortion of the real value. We analyze to what extent this distortion may invalidate the comparison between taggers or the measure of the improvement given by a new system. The main conclusion is that a more rigorous testing experimentation setting/designing is needed to reliably evaluate and compare tagger accuracies.