Cross-task crowdsourcing

Authors:
Kaixiang Mo;Erheng Zhong;Qiang Yang
Affiliations:
Hong Kong University of Science and Technology, Hong Kong, Hong Kong;Hong Kong University of Science and Technology, Hong Kong, Hong Kong;Hong Kong University of Science and Technology & Huawei Noah's Ark Lab, Hong Kong, Hong Kong
Venue:
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2013

Citing 8
Cited 0

Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Domain adaptation for statistical classifiers

Journal of Artificial Intelligence Research
Heterogeneous transfer learning for image clustering via the social web

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
A Survey on Transfer Learning

IEEE Transactions on Knowledge and Data Engineering
Flexible sample selection strategies for transfer learning in ranking

Information Processing and Management: an International Journal
Max algorithms in crowdsourcing environments

Proceedings of the 21st international conference on World Wide Web
Eliminating spammers and ranking annotators for crowdsourced labeling tasks

The Journal of Machine Learning Research
ComSoc: adaptive transfer of user behaviors over composite social network

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Crowdsourcing is an effective method for collecting labeled data for various data mining tasks. It is critical to ensure the veracity of the produced data because responses collected from different users may be noisy and unreliable. Previous works solve this veracity problem by estimating both the user ability and question difficulty based on the knowledge in each task individually. In this case, each single task needs large amounts of data to provide accurate estimations. However, in practice, budgets provided by customers for a given target task may be limited, and hence each question can be presented to only a few users where each user can answer only a few questions. This data sparsity problem can cause previous approaches to perform poorly due to the overfitting problem on rare data and eventually damage the data veracity. Fortunately, in real-world applications, users can answer questions from multiple historical tasks. For example, one can annotate images as well as label the sentiment of a given title. In this paper, we employ transfer learning, which borrows knowledge from auxiliary historical tasks to improve the data veracity in a given target task. The motivation is that users have stable characteristics across different crowdsourcing tasks and thus data from different tasks can be exploited collectively to estimate users' abilities in the target task. We propose a hierarchical Bayesian model, TLC (Transfer Learning for Crowdsourcing), to implement this idea by considering the overlapping users as a bridge. In addition, to avoid possible negative impact, TLC introduces task-specific factors to model task differences. The experimental results show that TLC significantly improves the accuracy over several state-of-the-art non-transfer-learning approaches under very limited budget in various labeling tasks.