Randomized algorithms
Introduction to Algorithms, Third Edition
Introduction to Algorithms, Third Edition
Distilling the wisdom of crowds: weighted aggregation of decisions on multiple issues
Autonomous Agents and Multi-Agent Systems
Twitinfo: aggregating and visualizing microblogs for event exploration
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
CrowdDB: answering queries with crowdsourcing
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Proceedings of the VLDB Endowment
Pushing the boundaries of crowd-enabled databases with query-driven schema expansion
Proceedings of the VLDB Endowment
Max algorithms in crowdsourcing environments
Proceedings of the 21st international conference on World Wide Web
CrowdScreen: algorithms for filtering data with humans
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
So who won?: dynamic max discovery with the crowd
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Asking the Right Questions in Crowd Data Sourcing
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
CDAS: a crowdsourcing data analytics system
Proceedings of the VLDB Endowment
CrowdER: crowdsourcing entity resolution
Proceedings of the VLDB Endowment
Answering planning queries with the crowd
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Group-by and top-k are fundamental constructs in database queries. However, the criteria used for grouping and ordering certain types of data -- such as unlabeled photos clustered by the same person ordered by age -- are difficult to evaluate by machines. In contrast, these tasks are easy for humans to evaluate and are therefore natural candidates for being crowd-sourced. We study the problem of evaluating top-k and group-by queries using the crowd to answer either type or value questions. Given two data elements, the answer to a type question is "yes" if the elements have the same type and therefore belong to the same group or cluster; the answer to a value question orders the two data elements. The assumption here is that there is an underlying ground truth, but that the answers returned by the crowd may sometimes be erroneous. We formalize the problems of top-k and group-by in the crowd-sourced setting, and give efficient algorithms that are guaranteed to achieve good results with high probability. We analyze the crowd-sourced cost of these algorithms in terms of the total number of type and value questions, and show that they are essentially the best possible. We also show that fewer questions are needed when values and types are correlated, or when the error model is one in which the error decreases as the distance between the two elements in the sorted order increases.