Learning from crowds in the presence of schools of thought

Authors:
Yuandong Tian;Jun Zhu
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;Tsinghua University, Beijing, China
Venue:
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2012

Citing 7
Cited 4

The cost structure of sensemaking

INTERCHI '93 Proceedings of the INTERCHI '93 conference on Human factors in computing systems
Crowdsourcing user studies with Mechanical Turk

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Efficiently learning the accuracy of labeling sources for selective sampling

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Who are the crowdworkers?: shifting demographics in mechanical turk

CHI '10 Extended Abstracts on Human Factors in Computing Systems
Quality management on Amazon Mechanical Turk

Proceedings of the ACM SIGKDD Workshop on Human Computation
Learning From Crowds

The Journal of Machine Learning Research

Reading the correct history?: modeling temporal intention in resource sharing

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Towards a generic framework for trustworthy spatial crowdsourcing

Proceedings of the 12th International ACM Workshop on Data Engineering for Wireless and Mobile Acess
Evaluating the crowd with confidence

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A transfer learning based framework of crowd-selection on twitter

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Crowdsourcing has recently become popular among machine learning researchers and social scientists as an effective way to collect large-scale experimental data from distributed workers. To extract useful information from the cheap but potentially unreliable answers to tasks, a key problem is to identify reliable workers as well as unambiguous tasks. Although for objective tasks that have one correct answer per task, previous works can estimate worker reliability and task clarity based on the single gold standard assumption, for tasks that are subjective and accept multiple reasonable answers that workers may be grouped into, a phenomenon called schools of thought, existing models cannot be trivially applied. In this work, we present a statistical model to estimate worker reliability and task clarity without resorting to the single gold standard assumption. This is instantiated by explicitly characterizing the grouping behavior to form schools of thought with a rank-1 factorization of a worker-task groupsize matrix. Instead of performing an intermediate inference step, which can be expensive and unstable, we present an algorithm to analytically compute the sizes of different groups. We perform extensive empirical studies on real data collected from Amazon Mechanical Turk. Our method discovers the schools of thought, shows reasonable estimation of worker reliability and task clarity, and is robust to hyperparameter changes. Furthermore, our estimated worker reliability can be used to improve the gold standard prediction for objective tasks.