Approximation Algorithms for the k-Clique Covering Problem
SIAM Journal on Discrete Mathematics
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A Primitive Operator for Similarity Joins in Data Cleaning
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Scaling up all pairs similarity search
Proceedings of the 16th international conference on World Wide Web
Pay-as-you-go user feedback for dataspace systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Febrl: a freely available record linkage system with a graphical user interface
HDKM '08 Proceedings of the second Australasian workshop on Health data and knowledge management - Volume 80
Matching Schemas in Online Communities: A Web 2.0 Approach
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
On active learning of record matching packages
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
CrowdSearch: exploiting crowds for accurate real-time image search on mobile phones
Proceedings of the 8th international conference on Mobile systems, applications, and services
Quality management on Amazon Mechanical Turk
Proceedings of the ACM SIGKDD Workshop on Human Computation
Soylent: a word processor with a crowd inside
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology
Evaluation of entity resolution approaches on real-world match problems
Proceedings of the VLDB Endowment
Crowdsourcing systems on the World-Wide Web
Communications of the ACM
CrowdDB: answering queries with crowdsourcing
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient similarity joins for near-duplicate detection
ACM Transactions on Database Systems (TODS)
Proceedings of the VLDB Endowment
A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication
IEEE Transactions on Knowledge and Data Engineering
10th international workshop on quality in databases: QDB 2012
ACM SIGMOD Record
Using the crowd for top-k and group-by queries
Proceedings of the 16th International Conference on Database Theory
Knowledge harvesting in the big-data era
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Leveraging transitive relations for crowdsourced joins
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
An online cost sensitive decision-making method in crowdsourcing systems
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
YaLi: a crowdsourcing plug-in for NERD
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Towards a generic framework for trustworthy spatial crowdsourcing
Proceedings of the 12th International ACM Workshop on Data Engineering for Wireless and Mobile Acess
Evaluating the crowd with confidence
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
WiseMarket: a new paradigm for managing wisdom of online social users
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Optimizing plurality for human intelligence tasks
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
User-driven quality evaluation of DBpedia
Proceedings of the 9th International Conference on Semantic Systems
Big data challenge: a data management perspective
Frontiers of Computer Science: Selected Publications from Chinese Universities
Question selection for crowd entity resolution
Proceedings of the VLDB Endowment
Answering planning queries with the crowd
Proceedings of the VLDB Endowment
Reducing uncertainty of schema matching via crowdsourcing
Proceedings of the VLDB Endowment
Crowdsourcing-assisted query structure interpretation
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Large-scale linked data integration using probabilistic reasoning and crowdsourcing
The VLDB Journal — The International Journal on Very Large Data Bases
Hybrid entity clustering using crowds and data
The VLDB Journal — The International Journal on Very Large Data Bases
Learning an accurate entity resolution model from crowdsourced labels
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Hi-index | 0.00 |
Entity resolution is central to data integration and data cleaning. Algorithmic approaches have been improving in quality, but remain far from perfect. Crowdsourcing platforms offer a more accurate but expensive (and slow) way to bring human insight into the process. Previous work has proposed batching verification tasks for presentation to human workers but even with batching, a human-only approach is infeasible for data sets of even moderate size, due to the large numbers of matches to be tested. Instead, we propose a hybrid human-machine approach in which machines are used to do an initial, coarse pass over all the data, and people are used to verify only the most likely matching pairs. We show that for such a hybrid system, generating the minimum number of verification tasks of a given size is NP-Hard, but we develop a novel two-tiered heuristic approach for creating batched tasks. We describe this method, and present the results of extensive experiments on real data sets using a popular crowdsourcing platform. The experiments show that our hybrid approach achieves both good efficiency and high accuracy compared to machine-only or human-only alternatives.