Efficiency of a Good But Not Linear Set Union Algorithm
Journal of the ACM (JACM)
Machine Learning
Pay-as-you-go user feedback for dataspace systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Entity resolution with iterative blocking
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Quality management on Amazon Mechanical Turk
Proceedings of the ACM SIGKDD Workshop on Human Computation
Exploring iterative and parallel human computation processes
Proceedings of the ACM SIGKDD Workshop on Human Computation
TurKit: human computation algorithms on mechanical turk
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology
Human-assisted graph search: it's okay to ask questions
Proceedings of the VLDB Endowment
CrowdDB: answering queries with crowdsourcing
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
CrowdForge: crowdsourcing complex work
Proceedings of the 24th annual ACM symposium on User interface software and technology
Proceedings of the VLDB Endowment
Proceedings of the 21st international conference on World Wide Web
Max algorithms in crowdsourcing environments
Proceedings of the 21st international conference on World Wide Web
CrowdScreen: algorithms for filtering data with humans
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
So who won?: dynamic max discovery with the crowd
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
CDAS: a crowdsourcing data analytics system
Proceedings of the VLDB Endowment
CrowdER: crowdsourcing entity resolution
Proceedings of the VLDB Endowment
A human-machine method for web table understanding
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Hi-index | 0.00 |
The development of crowdsourced query processing systems has recently attracted a significant attention in the database community. A variety of crowdsourced queries have been investigated. In this paper, we focus on the crowdsourced join query which aims to utilize humans to find all pairs of matching objects from two collections. As a human-only solution is expensive, we adopt a hybrid human-machine approach which first uses machines to generate a candidate set of matching pairs, and then asks humans to label the pairs in the candidate set as either matching or non-matching. Given the candidate pairs, existing approaches will publish all pairs for verification to a crowdsourcing platform. However, they neglect the fact that the pairs satisfy transitive relations. As an example, if o1 matches with o2, and o2 matches with o3, then we can deduce that o1 matches with o3 without needing to crowdsource (o1, o3). To this end, we study how to leverage transitive relations for crowdsourced joins. We propose a hybrid transitive-relations and crowdsourcing labeling framework which aims to crowdsource the minimum number of pairs to label all the candidate pairs. We prove the optimal labeling order and devise a parallel labeling algorithm to efficiently crowdsource the pairs following the order. We evaluate our approaches in both simulated environment and a real crowdsourcing platform. Experimental results show that our approaches with transitive relations can save much more money and time than existing methods, with a little loss in the result quality.