Learning with an unreliable teacher
Pattern Recognition
C4.5: programs for machine learning
C4.5: programs for machine learning
Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
Bounds on the mean classification error rate of multiple experts
Pattern Recognition Letters
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning
An Instance-Weighting Method to Induce Cost-Sensitive Trees
IEEE Transactions on Knowledge and Data Engineering
Cost-Sensitive Learning by Cost-Proportionate Example Weighting
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Active Sampling for Class Probability Estimation and Ranking
Machine Learning
Online Choice of Active Learning Algorithms
The Journal of Machine Learning Research
Active Feature-Value Acquisition for Classifier Induction
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Noisy information value in utility-based decision making
UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Toward economic machine learning and utility-based data mining
UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Cost-Constrained Data Acquisition for Intelligent Data Preparation
IEEE Transactions on Knowledge and Data Engineering
An Expected Utility Approach to Active Feature-Value Acquisition
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Active cost-sensitive learning
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Budgeted learning of nailve-bayes classifiers
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Learning and classifying under hard budgets
ECML'05 Proceedings of the 16th European conference on Machine Learning
A quality-aware optimizer for information extraction
ACM Transactions on Database Systems (TODS)
Building query optimizers for information extraction: the SQoUT project
ACM SIGMOD Record
Good learners for evil teachers
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Supervised learning from multiple experts: whom to trust when everyone lies a bit
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Efficiently learning the accuracy of labeling sources for selective sampling
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Improving data mining utility with projective sampling
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Enabling analysts in managed services for CRM analytics
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Catching the drift: learning broad matches from clickthrough data
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Data quality from crowdsourcing: a study of annotation selection criteria
HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
Proceedings of the ACM SIGKDD Workshop on Human Computation
txteagle: Mobile Crowdsourcing
IDGD '09 Proceedings of the 3rd International Conference on Internationalization, Design and Global Development: Held as Part of HCI International 2009
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Feature selection for ranking using boosted trees
Proceedings of the 18th ACM conference on Information and knowledge management
Learning term-weighting functions for similarity measures
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Acquiring high quality non-expert knowledge from on-demand workforce
People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Crowd translator: on building localized speech recognizers through micropayments
ACM SIGOPS Operating Systems Review
Proceedings of the international conference on Multimedia information retrieval
Characterizing debate performance via aggregated twitter sentiment
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Modulating video credibility via visualization of quality evaluations
Proceedings of the 4th workshop on Information credibility
The labor economics of paid crowdsourcing
Proceedings of the 11th ACM conference on Electronic commerce
CrowdSearch: exploiting crowds for accurate real-time image search on mobile phones
Proceedings of the 8th international conference on Mobile systems, applications, and services
Crowdsourcing the assembly of concept hierarchies
Proceedings of the 10th annual joint conference on Digital libraries
Collecting high quality overlapping labels at low cost
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Active learning for biomedical citation screening
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
The anatomy of a large-scale human computation engine
Proceedings of the ACM SIGKDD Workshop on Human Computation
Some empirical evidence for annotation noise in a benchmarked dataset
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
"Was it good? It was provocative." Learning the meaning of scalar adjectives
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
The Journal of Machine Learning Research
Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels
The Journal of Machine Learning Research
Using Crowdsourcing and Active Learning to Track Sentiment in Online Media
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Regression Learning with Multiple Noisy Oracles
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
An elaborated model of social search
Information Processing and Management: an International Journal
Rethinking grammatical error annotation and evaluation with the Amazon Mechanical Turk
IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
Anveshan: a framework for analysis of multiple annotators' labeling behavior
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Scalable crisis relief: Crowdsourced SMS translation and categorization with Mission 4636
Proceedings of the First ACM Symposium on Computing for Development
Everyone's an influencer: quantifying influence on twitter
Proceedings of the fourth ACM international conference on Web search and data mining
Evaluating the visual quality of web pages using a computational aesthetic approach
Proceedings of the fourth ACM international conference on Web search and data mining
Towards a research agenda for enterprise crowdsourcing
ISoLA'10 Proceedings of the 4th international conference on Leveraging applications of formal methods, verification, and validation - Volume Part I
Robust sentiment detection on Twitter from biased and noisy data
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Designing incentives for inexpert human raters
Proceedings of the ACM 2011 conference on Computer supported cooperative work
Human computation: a survey and taxonomy of a growing field
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Turkomatic: automatic recursive task and workflow design for mechanical turk
CHI '11 Extended Abstracts on Human Factors in Computing Systems
Crowdsourcing for search and data mining
ACM SIGIR Forum
Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Learning to rank from a noisy crowd
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Reducing the need for double annotation
LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Deriving the Pricing Power of Product Features by Mining Consumer Reviews
Management Science
Online active inference and learning
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning from multiple annotators with Gaussian processes
ICANN'11 Proceedings of the 21st international conference on Artificial neural networks - Volume Part II
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Guess what? a game for affective annotation of video using crowd sourcing
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part I
Do they belong to the same class: active learning by querying pairwise label homogeneity
Proceedings of the 20th ACM international conference on Information and knowledge management
Collaboratively crowdsourcing workflows with turkomatic
Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work
Pushing the boundaries of crowd-enabled databases with query-driven schema expansion
Proceedings of the VLDB Endowment
Resource-Bounded information extraction: acquiring missing feature values on demand
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
On truth discovery in social sensing: a maximum likelihood estimation approach
Proceedings of the 11th international conference on Information Processing in Sensor Networks
Eliminating spammers and ranking annotators for crowdsourced labeling tasks
The Journal of Machine Learning Research
Active cleaning for video corpus annotation
MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Profanity use in online communities
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
CrowdScreen: algorithms for filtering data with humans
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Building subjectivity lexicon(s) from scratch for essay data
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Automatic identification of personal insults on social news sites
Journal of the American Society for Information Science and Technology
Learning to rank under multiple annotators
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Active learning with c-certainty
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Combining human and machine intelligence in large-scale crowdsourcing
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Multiplicity and word sense: evaluating and learning from multiply labeled word sense annotations
Language Resources and Evaluation
Big data versus the crowd: looking for relationships in all the right places
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Crowdsourcing micro-level multimedia annotations: the challenges of evaluation and interface
Proceedings of the ACM multimedia 2012 workshop on Crowdsourcing for multimedia
Alternative assessor disagreement and retrieval depth
Proceedings of the 21st ACM international conference on Information and knowledge management
Map to humans and reduce error: crowdsourcing for deduplication applied to digital libraries
Proceedings of the 21st ACM international conference on Information and knowledge management
Proceedings of the 21st ACM international conference on Information and knowledge management
Pay by the bit: an information-theoretic metric for collective human judgment
Proceedings of the 2013 conference on Computer supported cooperative work
Proceedings of the 2013 conference on Computer supported cooperative work
Perspectives on crowdsourcing annotations for natural language processing
Language Resources and Evaluation
An introduction to crowdsourcing for language and multimedia technology research
PROMISE'12 Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization
Implementing crowdsourcing-based relevance experimentation: an industrial perspective
Information Retrieval
Tagging human activities in video by crowdsourcing
Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Efficient crowdsourcing for multi-class labeling
Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
It is about time: time aware quality management for interactive systems with humans in the loop
CHI '13 Extended Abstracts on Human Factors in Computing Systems
Light at the end of the tunnel: a Monte Carlo approach to computing value of information
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Evaluating the crowd with confidence
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical quality estimation for general crowdsourcing tasks
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A transfer learning based framework of crowd-selection on twitter
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 22nd international conference on World Wide Web
Aggregating crowdsourced binary ratings
Proceedings of the 22nd international conference on World Wide Web
Pricing mechanisms for crowdsourcing markets
Proceedings of the 22nd international conference on World Wide Web
A threshold method for imbalanced multiple noisy labeling
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Selective sampling and active learning from single and multiple teachers
The Journal of Machine Learning Research
Instant foodie: predicting expert ratings from grassroots
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Computer Vision and Image Understanding
A lightweight combinatorial approach for inferring the ground truth from multiple annotators
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Maximum likelihood analysis of conflicting observations in social sensing
ACM Transactions on Sensor Networks (TOSN)
AskSheet: efficient human computation for decision making with spreadsheets
Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing
Lifelong learning for acquiring the wisdom of the crowd
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Accurate integration of crowdsourced labels using workers' self-reported confidence scores
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Trust, but verify: predicting contribution quality for knowledge base construction and curation
Proceedings of the 7th ACM international conference on Web search and data mining
Sentiment analysis on evolving social streams: how self-report imbalances can help
Proceedings of the 7th ACM international conference on Web search and data mining
Proceedings of the 19th international conference on Intelligent User Interfaces
Learning an accurate entity resolution model from crowdsourced labels
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Learning classification models from multiple experts
Journal of Biomedical Informatics
Expert Systems with Applications: An International Journal
Convex and scalable weakly labeled SVMs
The Journal of Machine Learning Research
STFU NOOB!: predicting crowdsourced decisions on toxic behavior in online games
Proceedings of the 23rd international conference on World wide web
Repeated labeling using multiple noisy labelers
Data Mining and Knowledge Discovery
Collaborative information acquisition for data-driven decisions
Machine Learning
Multimedia Tools and Applications
Explaining data-driven document classifications
MIS Quarterly
Hi-index | 0.00 |
This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction. With the outsourcing of small tasks becoming easier, for example via Rent-A-Coder or Amazon's Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. With low-cost labeling, preparing the unlabeled part of the data can become considerably more expensive than labeling. We present repeated-labeling strategies of increasing complexity, and show several main results. (i) Repeated-labeling can improve label quality and model quality, but not always. (ii) When labels are noisy, repeated labeling can be preferable to single labeling even in the traditional setting where labels are not particularly cheap. (iii) As soon as the cost of processing the unlabeled data is not free, even the simple strategy of labeling everything multiple times can give considerable advantage. (iv) Repeatedly labeling a carefully chosen set of points is generally preferable, and we present a robust technique that combines different notions of uncertainty to select data points for which quality should be improved. The bottom line: the results show clearly that when labeling is not perfect, selective acquisition of multiple labels is a strategy that data miners should have in their repertoire; for certain label-quality/cost regimes, the benefit is substantial.