Community-based bayesian aggregation models for crowdsourcing

Authors:
Matteo Venanzi;John Guiver;Gabriella Kazai;Pushmeet Kohli;Milad Shokouhi
Affiliations:
University of Southampton, Southampton, United Kingdom;Microsoft Research, Cambridge, United Kingdom;Microsoft, Cambridge, United Kingdom;Microsoft Research, Cambridge, United Kingdom;Microsoft, Cambridge, United Kingdom
Venue:
Proceedings of the 23rd international conference on World wide web
Year:
2014

Citing 9
Cited 0

A family of algorithms for approximate bayesian inference

A family of algorithms for approximate bayesian inference
Variational Message Passing

The Journal of Machine Learning Research
Quality management on Amazon Mechanical Turk

Proceedings of the ACM SIGKDD Workshop on Human Computation
Learning From Crowds

The Journal of Machine Learning Research
Analyzing the Amazon Mechanical Turk marketplace

XRDS: Crossroads, The ACM Magazine for Students - Comp-YOU-Ter
In search of quality in crowdsourcing for search engine evaluation

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Max algorithms in crowdsourcing environments

Proceedings of the 21st international conference on World Wide Web
Combining human and machine intelligence in large-scale crowdsourcing

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Efficient budget allocation with accuracy guarantees for crowdsourcing classification tasks

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the problem of extracting accurate labels from crowdsourced datasets, a key challenge in crowdsourcing. Prior work has focused on modeling the reliability of individual workers, for instance, by way of confusion matrices, and using these latent traits to estimate the true labels more accurately. However, this strategy becomes ineffective when there are too few labels per worker to reliably estimate their quality. To mitigate this issue, we propose a novel community-based Bayesian label aggregation model, CommunityBCC, which assumes that crowd workers conform to a few different types, where each type represents a group of workers with similar confusion matrices. We assume that each worker belongs to a certain community, where the worker's confusion matrix is similar to (a perturbation of) the community's confusion matrix. Our model can then learn a set of key latent features: (i) the confusion matrix of each community, (ii) the community membership of each user, and (iii) the aggregated label of each item. We compare the performance of our model against established aggregation methods on a number of large-scale, real-world crowdsourcing datasets. Our experimental results show that our CommunityBCC model consistently outperforms state-of-the-art label aggregation methods, requiring, on average, 50% less data to pass the 90% accuracy mark.