Learning an accurate entity resolution model from crowdsourced labels

Authors:
Jingjing Wang;Satoshi Oyama;Masahito Kurihara;Hisashi Kashima
Affiliations:
Hokkaido University, Sapporo, Japan;Hokkaido University, Sapporo, Japan;Hokkaido University, Sapporo, Japan;The University of Tokyo, Tokyo, Japan
Venue:
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Year:
2014

Citing 17
Cited 0

Interactive deduplication using active learning

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning domain-independent string transformation weights for high accuracy object identification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Grouping search-engine returned citations for person-name queries

Proceedings of the 6th annual ACM international workshop on Web information and data management
Disambiguating Web appearances of people in a social network

WWW '05 Proceedings of the 14th international conference on World Wide Web
Person resolution in person search results: WebHawk

Proceedings of the 14th ACM international conference on Information and knowledge management
Unsupervised personal name disambiguation

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Adaptive Name Matching in Information Integration

IEEE Intelligent Systems
Get another label? improving data quality and data mining using multiple, noisy labelers

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Creating personal histories from the web using namesake disambiguation and event extraction

ICWE'07 Proceedings of the 7th international conference on Web engineering
Collecting high quality overlapping labels at low cost

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Human-powered sorts and joins

Proceedings of the VLDB Endowment
CrowdScreen: algorithms for filtering data with humans

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
CrowdER: crowdsourcing entity resolution

Proceedings of the VLDB Endowment
Detecting Anomalies in Bipartite Graphs with Mutual Dependency Principles

ICDM '12 Proceedings of the 2012 IEEE 12th International Conference on Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigated the use of supervised learning methods that use labels from crowd workers to resolve entities. Although obtaining labeled data by crowdsourcing can reduce time and cost, it also brings challenges (e.g., coping with the variable quality of crowd-generated data). First, we evaluated the quality of crowd-generated labels for actual entity resolution data sets. Then, we evaluated the prediction accuracy of two machine learning methods that use labels from crowd workers: a conventional LPP method using consensus labels obtained by majority voting and our proposed method that combines multiple Laplacians directly by using crowdsourced data. We discussed the relationship between the accuracy of workers' labels and the prediction accuracy of the two methods.