Learning an accurate entity resolution model from crowdsourced labels

  • Authors:
  • Jingjing Wang;Satoshi Oyama;Masahito Kurihara;Hisashi Kashima

  • Affiliations:
  • Hokkaido University, Sapporo, Japan;Hokkaido University, Sapporo, Japan;Hokkaido University, Sapporo, Japan;The University of Tokyo, Tokyo, Japan

  • Venue:
  • Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigated the use of supervised learning methods that use labels from crowd workers to resolve entities. Although obtaining labeled data by crowdsourcing can reduce time and cost, it also brings challenges (e.g., coping with the variable quality of crowd-generated data). First, we evaluated the quality of crowd-generated labels for actual entity resolution data sets. Then, we evaluated the prediction accuracy of two machine learning methods that use labels from crowd workers: a conventional LPP method using consensus labels obtained by majority voting and our proposed method that combines multiple Laplacians directly by using crowdsourced data. We discussed the relationship between the accuracy of workers' labels and the prediction accuracy of the two methods.