Get another label? improving data quality and data mining using multiple, noisy labelers

Authors:
Victor S. Sheng;Foster Provost;Panagiotis G. Ipeirotis
Affiliations:
Leonard N. Stern School of Business, New York University, New York, NY, USA;Leonard N. Stern School of Business, New York University, New York, NY, USA;Leonard N. Stern School of Business, New York University, New York, NY, USA
Venue:
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2008

Citing 23
Cited 110

Learning with an unreliable teacher

Pattern Recognition
C4.5: programs for machine learning

C4.5: programs for machine learning
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
Bounds on the mean classification error rate of multiple experts

Pattern Recognition Letters
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Random Forests

Machine Learning
An Instance-Weighting Method to Induce Cost-Sensitive Trees

IEEE Transactions on Knowledge and Data Engineering
Cost-Sensitive Learning by Cost-Proportionate Example Weighting

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Active Sampling for Class Probability Estimation and Ranking

Machine Learning
Online Choice of Active Learning Algorithms

The Journal of Machine Learning Research
Active Feature-Value Acquisition for Classifier Induction

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Noisy information value in utility-based decision making

UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Toward economic machine learning and utility-based data mining

UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Cost-Constrained Data Acquisition for Intelligent Data Preparation

IEEE Transactions on Knowledge and Data Engineering
An Expected Utility Approach to Active Feature-Value Acquisition

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Selectively Acquiring Customer Information: A New Data Acquisition Problem and an Active Learning-Based Solution

Management Science
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm

Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Active cost-sensitive learning

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Budgeted learning of nailve-bayes classifiers

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Learning and classifying under hard budgets

ECML'05 Proceedings of the 16th European conference on Machine Learning

A quality-aware optimizer for information extraction

ACM Transactions on Database Systems (TODS)
Building query optimizers for information extraction: the SQoUT project

ACM SIGMOD Record
Good learners for evil teachers

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Supervised learning from multiple experts: whom to trust when everyone lies a bit

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Efficiently learning the accuracy of labeling sources for selective sampling

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Improving data mining utility with projective sampling

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Enabling analysts in managed services for CRM analytics

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Catching the drift: learning broad matches from clickthrough data

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Data quality from crowdsourcing: a study of annotation selection criteria

HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
From active towards InterActive learning: using consideration information to improve labeling correctness

Proceedings of the ACM SIGKDD Workshop on Human Computation
txteagle: Mobile Crowdsourcing

IDGD '09 Proceedings of the 3rd International Conference on Internationalization, Design and Global Development: Held as Part of HCI International 2009
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Feature selection for ranking using boosted trees

Proceedings of the 18th ACM conference on Information and knowledge management
Learning term-weighting functions for similarity measures

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Acquiring high quality non-expert knowledge from on-demand workforce

People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Crowd translator: on building localized speech recognizers through micropayments

ACM SIGOPS Operating Systems Review
How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation

Proceedings of the international conference on Multimedia information retrieval
Characterizing debate performance via aggregated twitter sentiment

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Modulating video credibility via visualization of quality evaluations

Proceedings of the 4th workshop on Information credibility
The labor economics of paid crowdsourcing

Proceedings of the 11th ACM conference on Electronic commerce
CrowdSearch: exploiting crowds for accurate real-time image search on mobile phones

Proceedings of the 8th international conference on Mobile systems, applications, and services
Crowdsourcing the assembly of concept hierarchies

Proceedings of the 10th annual joint conference on Digital libraries
Collecting high quality overlapping labels at low cost

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Active learning for biomedical citation screening

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Why label when you can search?: alternatives to active learning for applying human resources to build classification models under extreme class imbalance

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
The anatomy of a large-scale human computation engine

Proceedings of the ACM SIGKDD Workshop on Human Computation
Some empirical evidence for annotation noise in a benchmarked dataset

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
"Was it good? It was provocative." Learning the meaning of scalar adjectives

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Learning From Crowds

The Journal of Machine Learning Research
Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels

The Journal of Machine Learning Research
Using Crowdsourcing and Active Learning to Track Sentiment in Online Media

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Regression Learning with Multiple Noisy Oracles

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
An elaborated model of social search

Information Processing and Management: an International Journal
Rethinking grammatical error annotation and evaluation with the Amazon Mechanical Turk

IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
Anveshan: a framework for analysis of multiple annotators' labeling behavior

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Scalable crisis relief: Crowdsourced SMS translation and categorization with Mission 4636

Proceedings of the First ACM Symposium on Computing for Development
Everyone's an influencer: quantifying influence on twitter

Proceedings of the fourth ACM international conference on Web search and data mining
Evaluating the visual quality of web pages using a computational aesthetic approach

Proceedings of the fourth ACM international conference on Web search and data mining
Towards a research agenda for enterprise crowdsourcing

ISoLA'10 Proceedings of the 4th international conference on Leveraging applications of formal methods, verification, and validation - Volume Part I
Robust sentiment detection on Twitter from biased and noisy data

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Designing incentives for inexpert human raters

Proceedings of the ACM 2011 conference on Computer supported cooperative work
Human computation: a survey and taxonomy of a growing field

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Turkomatic: automatic recursive task and workflow design for mechanical turk

CHI '11 Extended Abstracts on Human Factors in Computing Systems
Crowdsourcing for search and data mining

ACM SIGIR Forum
Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Learning to rank from a noisy crowd

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Reducing the need for double annotation

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Deriving the Pricing Power of Product Features by Mining Consumer Reviews

Management Science
Online active inference and learning

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning from multiple annotators with Gaussian processes

ICANN'11 Proceedings of the 21st international conference on Artificial neural networks - Volume Part II
Learning from inconsistent and unreliable annotators by a Gaussian mixture model and Bayesian information criterion

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Guess what? a game for affective annotation of video using crowd sourcing

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part I
Do they belong to the same class: active learning by querying pairwise label homogeneity

Proceedings of the 20th ACM international conference on Information and knowledge management
Collaboratively crowdsourcing workflows with turkomatic

Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work
Pushing the boundaries of crowd-enabled databases with query-driven schema expansion

Proceedings of the VLDB Endowment
Resource-Bounded information extraction: acquiring missing feature values on demand

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
On truth discovery in social sensing: a maximum likelihood estimation approach

Proceedings of the 11th international conference on Information Processing in Sensor Networks
Eliminating spammers and ranking annotators for crowdsourced labeling tasks

The Journal of Machine Learning Research
Active cleaning for video corpus annotation

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Profanity use in online communities

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
CrowdScreen: algorithms for filtering data with humans

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Building subjectivity lexicon(s) from scratch for essay data

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Automatic identification of personal insults on social news sites

Journal of the American Society for Information Science and Technology
Learning to rank under multiple annotators

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Active learning with c-certainty

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Combining human and machine intelligence in large-scale crowdsourcing

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Multiplicity and word sense: evaluating and learning from multiply labeled word sense annotations

Language Resources and Evaluation
Big data versus the crowd: looking for relationships in all the right places

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Crowdsourcing micro-level multimedia annotations: the challenges of evaluation and interface

Proceedings of the ACM multimedia 2012 workshop on Crowdsourcing for multimedia
Alternative assessor disagreement and retrieval depth

Proceedings of the 21st ACM international conference on Information and knowledge management
Map to humans and reduce error: crowdsourcing for deduplication applied to digital libraries

Proceedings of the 21st ACM international conference on Information and knowledge management
The face of quality in crowdsourcing relevance labels: demographics, personality and labeling accuracy

Proceedings of the 21st ACM international conference on Information and knowledge management
Pay by the bit: an information-theoretic metric for collective human judgment

Proceedings of the 2013 conference on Computer supported cooperative work
The future of crowd work

Proceedings of the 2013 conference on Computer supported cooperative work
Perspectives on crowdsourcing annotations for natural language processing

Language Resources and Evaluation
An introduction to crowdsourcing for language and multimedia technology research

PROMISE'12 Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization
Implementing crowdsourcing-based relevance experimentation: an industrial perspective

Information Retrieval
Tagging human activities in video by crowdsourcing

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Crowd mining

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Efficient crowdsourcing for multi-class labeling

Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
It is about time: time aware quality management for interactive systems with humans in the loop

CHI '13 Extended Abstracts on Human Factors in Computing Systems
Light at the end of the tunnel: a Monte Carlo approach to computing value of information

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Evaluating the crowd with confidence

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical quality estimation for general crowdsourcing tasks

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A transfer learning based framework of crowd-selection on twitter

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Reactive crowdsourcing

Proceedings of the 22nd international conference on World Wide Web
Aggregating crowdsourced binary ratings

Proceedings of the 22nd international conference on World Wide Web
Pricing mechanisms for crowdsourcing markets

Proceedings of the 22nd international conference on World Wide Web
A threshold method for imbalanced multiple noisy labeling

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Selective sampling and active learning from single and multiple teachers

The Journal of Machine Learning Research
A probabilistic model of active learning with multiple noisy oracles

Neurocomputing
Instant foodie: predicting expert ratings from grassroots

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Using objective ground-truth labels created by multiple annotators for improved video classification: A comparative study

Computer Vision and Image Understanding
A lightweight combinatorial approach for inferring the ground truth from multiple annotators

MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Maximum likelihood analysis of conflicting observations in social sensing

ACM Transactions on Sensor Networks (TOSN)
AskSheet: efficient human computation for decision making with spreadsheets

Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing
Lifelong learning for acquiring the wisdom of the crowd

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Accurate integration of crowdsourced labels using workers' self-reported confidence scores

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Trust, but verify: predicting contribution quality for knowledge base construction and curation

Proceedings of the 7th ACM international conference on Web search and data mining
Sentiment analysis on evolving social streams: how self-report imbalances can help

Proceedings of the 7th ACM international conference on Web search and data mining
Toward crowdsourcing micro-level behavior annotations: the challenges of interface, training, and generalization

Proceedings of the 19th international conference on Intelligent User Interfaces
Learning an accurate entity resolution model from crowdsourced labels

Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Learning classification models from multiple experts

Journal of Biomedical Informatics
Leveraging non-expert crowdsourcing workers for improper task detection in crowdsourcing marketplaces

Expert Systems with Applications: An International Journal
Convex and scalable weakly labeled SVMs

The Journal of Machine Learning Research
STFU NOOB!: predicting crowdsourced decisions on toxic behavior in online games

Proceedings of the 23rd international conference on World wide web
Repeated labeling using multiple noisy labelers

Data Mining and Knowledge Discovery
Collaborative information acquisition for data-driven decisions

Machine Learning
Automatic annotation of image databases based on implicit crowdsourcing, visual concept modeling and evolution

Multimedia Tools and Applications
Explaining data-driven document classifications

MIS Quarterly

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction. With the outsourcing of small tasks becoming easier, for example via Rent-A-Coder or Amazon's Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. With low-cost labeling, preparing the unlabeled part of the data can become considerably more expensive than labeling. We present repeated-labeling strategies of increasing complexity, and show several main results. (i) Repeated-labeling can improve label quality and model quality, but not always. (ii) When labels are noisy, repeated labeling can be preferable to single labeling even in the traditional setting where labels are not particularly cheap. (iii) As soon as the cost of processing the unlabeled data is not free, even the simple strategy of labeling everything multiple times can give considerable advantage. (iv) Repeatedly labeling a carefully chosen set of points is generally preferable, and we present a robust technique that combines different notions of uncertainty to select data points for which quality should be improved. The bottom line: the results show clearly that when labeling is not perfect, selective acquisition of multiple labels is a strategy that data miners should have in their repertoire; for certain label-quality/cost regimes, the benefit is substantial.