Unanimity and compromise among probability forecasters
Management Science
Learning with an unreliable teacher
Pattern Recognition
C4.5: programs for machine learning
C4.5: programs for machine learning
Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
Bounds on the mean classification error rate of multiple experts
Pattern Recognition Letters
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning
An Instance-Weighting Method to Induce Cost-Sensitive Trees
IEEE Transactions on Knowledge and Data Engineering
Machine Learning
Cost-Sensitive Learning by Cost-Proportionate Example Weighting
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Active Sampling for Class Probability Estimation and Ranking
Machine Learning
Labeling images with a computer game
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Online Choice of Active Learning Algorithms
The Journal of Machine Learning Research
Active Feature-Value Acquisition for Classifier Induction
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Noisy information value in utility-based decision making
UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Toward economic machine learning and utility-based data mining
UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Cost-Constrained Data Acquisition for Intelligent Data Preparation
IEEE Transactions on Knowledge and Data Engineering
An Expected Utility Approach to Active Feature-Value Acquisition
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Get another label? improving data quality and data mining using multiple, noisy labelers
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Class Noise Mitigation Through Instance Weighting
ECML '07 Proceedings of the 18th European conference on Machine Learning
Proactive learning: cost-sensitive active learning with multiple imperfect oracles
Proceedings of the 17th ACM conference on Information and knowledge management
Active Feature-Value Acquisition
Management Science
Supervised learning from multiple experts: whom to trust when everyone lies a bit
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Efficiently learning the accuracy of labeling sources for selective sampling
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Financial incentives and the "performance of crowds"
Proceedings of the ACM SIGKDD Workshop on Human Computation
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Active cost-sensitive learning
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Ensemble methods for noise elimination in classification problems
MCS'03 Proceedings of the 4th international conference on Multiple classifier systems
Quality management on Amazon Mechanical Turk
Proceedings of the ACM SIGKDD Workshop on Human Computation
The Journal of Machine Learning Research
Budgeted learning of nailve-bayes classifiers
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Learning and classifying under hard budgets
ECML'05 Proceedings of the 16th European conference on Machine Learning
Some asymptotic properties of the probabilistic teacher (Corresp.)
IEEE Transactions on Information Theory
Hi-index | 0.00 |
This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction of predictive models. With the outsourcing of small tasks becoming easier, for example via Amazon's Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. With low-cost labeling, preparing the unlabeled part of the data can become considerably more expensive than labeling. We present repeated-labeling strategies of increasing complexity, and show several main results. (i) Repeated-labeling can improve label quality and model quality, but not always. (ii) When labels are noisy, repeated labeling can be preferable to single labeling even in the traditional setting where labels are not particularly cheap. (iii) As soon as the cost of processing the unlabeled data is not free, even the simple strategy of labeling everything multiple times can give considerable advantage. (iv) Repeatedly labeling a carefully chosen set of points is generally preferable, and we present a set of robust techniques that combine different notions of uncertainty to select data points for which quality should be improved. The bottom line: the results show clearly that when labeling is not perfect, selective acquisition of multiple labels is a strategy that data miners should have in their repertoire; for certain label-quality/cost regimes, the benefit is substantial.