A lightweight combinatorial approach for inferring the ground truth from multiple annotators

Authors:
Xiang Liu;Liyun Li;Nasir Memon
Affiliations:
Metrotech Center, Polytechnic Institute of New York University, Brooklyn, NY;Linkedin Corporation, Mountain View;Metrotech Center, Polytechnic Institute of New York University, Brooklyn, NY
Venue:
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2013

Citing 6
Cited 0

Empirical Support for Winnow and Weighted-MajorityAlgorithms: Results on a Calendar Scheduling Domain

Machine Learning
Get another label? improving data quality and data mining using multiple, noisy labelers

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Supervised learning from multiple experts: whom to trust when everyone lies a bit

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning From Crowds

The Journal of Machine Learning Research
Combining human and machine intelligence in large-scale crowdsourcing

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the increasing importance of producing large-scale labeled datasets for training, testing and validation, services such as Amazon Mechanical Turk (MTurk) are becoming more and more popular to replace the tedious task of manual labeling finished by hand. However, annotators in these crowdsourcing services are known to exhibit different levels of skills, consistencies and even biases, making it difficult to estimate the ground truth class label from the imperfect labels provided by these annotators. To solve this problem, we present a discriminative approach to infer the ground truth class labels by mapping both annotators and the tasks into a low-dimensional space. Our proposed model is inherently combinatorial and therefore does not require any prior knowledge about the annotators or the examples, thereby providing more simplicity and computational efficiency than the state-of-the-art Bayesian methods. We also show that our lightweight approach is, experimentally on real datasets, more accurate than either majority voting or weighted majority voting.