Get another label? improving data quality and data mining using multiple, noisy labelers
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
The Journal of Machine Learning Research
Managing crowdsourced human computation: a tutorial
Proceedings of the 20th international conference companion on World wide web
The computer is the new sewing machine: benefits and perils of crowdsourcing
Proceedings of the 20th international conference companion on World wide web
Who moderates the moderators?: crowdsourcing abuse detection in user-generated content
Proceedings of the 12th ACM conference on Electronic commerce
Crowdsourcing with endogenous entry
Proceedings of the 21st international conference on World Wide Web
Eliminating spammers and ranking annotators for crowdsourced labeling tasks
The Journal of Machine Learning Research
Low-Rank Matrix Approximation with Weights or Missing Data Is NP-Hard
SIAM Journal on Matrix Analysis and Applications
Incentives for truthful reporting in crowdsourcing
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
User-Friendly Tail Bounds for Sums of Random Matrices
Foundations of Computational Mathematics
Hi-index | 0.00 |
In this paper we analyze a crowdsourcing system consisting of a set of users and a set of binary choice questions. Each user has an unknown, fixed, reliability that determines the user's error rate in answering questions. The problem is to determine the truth values of the questions solely based on the user answers. Although this problem has been studied extensively, theoretical error bounds have been shown only for restricted settings: when the graph between users and questions is either random or complete. In this paper we consider a general setting of the problem where the user--question graph can be arbitrary. We obtain bounds on the error rate of our algorithm and show it is governed by the expansion of the graph. We demonstrate, using several synthetic and real datasets, that our algorithm outperforms the state of the art.