Crowdsourcing user studies with Mechanical Turk
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Get another label? improving data quality and data mining using multiple, noisy labelers
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
The Journal of Machine Learning Research
Analyzing the Amazon Mechanical Turk marketplace
XRDS: Crossroads, The ACM Magazine for Students - Comp-YOU-Ter
In search of quality in crowdsourcing for search engine evaluation
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Dirty jobs: the role of freelance labor in web service abuse
SEC'11 Proceedings of the 20th USENIX conference on Security
Serf and turf: crowdturfing for fun and profit
Proceedings of the 21st international conference on World Wide Web
Proceedings of the 2013 conference on Computer supported cooperative work
Hi-index | 12.05 |
Controlling the quality of tasks, i.e., propriety of posted jobs, is a major challenge in crowdsourcing marketplaces. Most existing crowdsourcing services prohibit requesters from posting illegal or objectionable tasks. Operators in marketplaces have to monitor tasks continuously to find such improper ones; however, it is very expensive to manually investigate each task. In this paper, we present the results of our trial study on automatic detection of improper tasks to support the monitoring of activities by marketplace operators. We performed experiments using real task data from a commercial crowdsourcing marketplace and showed that the classifier trained by the operators' judgments achieves a high performance in detecting improper tasks. By analyzing the estimated classifier, we observed several effective features for detecting improper tasks, such as the words appeared in the task information, the amount of money that each worker will receive for the task, and the type of worker qualification option set for a task. In addition, to reduce the annotation costs of the operators and improve classification performance, we considered the use of crowdsourcing for task annotation. We hired a group of crowdsourcing (non-expert) workers to monitor posted tasks and use their judgments to train the classifier. We were able to confirm that applying quality control techniques is beneficial for handling the variability in worker reliability and that it improved the performance of the classifier. Finally, our results showed that the use of non-expert judgments of crowdsourcing workers in combination with expert judgments improves the performance of detecting improper crowdsourcing tasks, and that the use of crowdsourced labels allows a reduction in the required number of expert judgments by 25% while maintaining the level of detection performance.