Cross-task crowdsourcing

  • Authors:
  • Kaixiang Mo;Erheng Zhong;Qiang Yang

  • Affiliations:
  • Hong Kong University of Science and Technology, Hong Kong, Hong Kong;Hong Kong University of Science and Technology, Hong Kong, Hong Kong;Hong Kong University of Science and Technology & Huawei Noah's Ark Lab, Hong Kong, Hong Kong

  • Venue:
  • Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Crowdsourcing is an effective method for collecting labeled data for various data mining tasks. It is critical to ensure the veracity of the produced data because responses collected from different users may be noisy and unreliable. Previous works solve this veracity problem by estimating both the user ability and question difficulty based on the knowledge in each task individually. In this case, each single task needs large amounts of data to provide accurate estimations. However, in practice, budgets provided by customers for a given target task may be limited, and hence each question can be presented to only a few users where each user can answer only a few questions. This data sparsity problem can cause previous approaches to perform poorly due to the overfitting problem on rare data and eventually damage the data veracity. Fortunately, in real-world applications, users can answer questions from multiple historical tasks. For example, one can annotate images as well as label the sentiment of a given title. In this paper, we employ transfer learning, which borrows knowledge from auxiliary historical tasks to improve the data veracity in a given target task. The motivation is that users have stable characteristics across different crowdsourcing tasks and thus data from different tasks can be exploited collectively to estimate users' abilities in the target task. We propose a hierarchical Bayesian model, TLC (Transfer Learning for Crowdsourcing), to implement this idea by considering the overlapping users as a bridge. In addition, to avoid possible negative impact, TLC introduces task-specific factors to model task differences. The experimental results show that TLC significantly improves the accuracy over several state-of-the-art non-transfer-learning approaches under very limited budget in various labeling tasks.