Learning with annotation noise

Authors:
Eyal Beigman;Beata Beigman Klebanov
Affiliations:
Washington University in St. Louis;Northwestern University
Venue:
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Year:
2009

Citing 27
Cited 5

Learning in the presence of malicious errors

STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Decision theoretic generalizations of the PAC model for neural net and other learning applications

Information and Computation
Efficient noise-tolerant learning from statistical queries

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Toward Efficient Agnostic Learning

Machine Learning - Special issue on computational learning theory, COLT'92
Large Margin Classification Using the Perceptron Algorithm

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Learning From Noisy Examples

Machine Learning
Learning noisy perceptrons by a perceptron in polynomial time

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
A polynomial-time algorithm for learning noisy linear threshold functions

FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science
Shallow parsing using noisy and non-stationary training material

The Journal of Machine Learning Research
New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Ranking algorithms for named-entity extraction: boosting and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Learning to extract information from semi-structured text using a discriminative context free grammar

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Filtering-Ranking Perceptron Learning for Partial Parsing

Machine Learning
Perceptrons: An Introduction to Computational Geometry

Perceptrons: An Introduction to Computational Geometry
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Supersense tagging of unknown nouns in WordNet

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Games with a Purpose

Computer
Hardness of Learning Halfspaces with Noise

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
New Results for Learning Noisy Parities and Halfspaces

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Incremental parsing with the perceptron algorithm

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Data-defined kernels for parse reranking derived from probabilistic models

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Incremental LTAG parsing

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Reliability measurement without limits

Computational Linguistics
Analyzing disagreements

HumanJudge '08 Proceedings of the Workshop on Human Judgements in Computational Linguistics
Exploiting 'subjective' annotations

HumanJudge '08 Proceedings of the Workshop on Human Judgements in Computational Linguistics
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
From annotator agreement to noise models

Computational Linguistics

From annotator agreement to noise models

Computational Linguistics
Some empirical evidence for annotation noise in a benchmarked dataset

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Detecting emails containing requests for action

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
To annotate more accurately or to annotate more

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Short answer assessment: establishing links between research strands

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is usually assumed that the kind of noise existing in annotated data is random classification noise. Yet there is evidence that differences between annotators are not always random attention slips but could result from different biases towards the classification categories, at least for the harder-to-decide cases. Under an annotation generation model that takes this into account, there is a hazard that some of the training instances are actually hard cases with unreliable annotations. We show that these are relatively unproblematic for an algorithm operating under the 0--1 loss model, whereas for the commonly used voted perceptron algorithm, hard training cases could result in incorrect prediction on the uncontroversial cases at test time.