Using the crowd for top-k and group-by queries

Authors:
Susan B. Davidson;Sanjeev Khanna;Tova Milo;Sudeepa Roy
Affiliations:
University of Pennsylvania;University of Pennsylvania;Tel Aviv University;University of Washington
Venue:
Proceedings of the 16th International Conference on Database Theory
Year:
2013

Citing 13
Cited 1

Randomized algorithms

Randomized algorithms
Introduction to Algorithms, Third Edition

Introduction to Algorithms, Third Edition
Distilling the wisdom of crowds: weighted aggregation of decisions on multiple issues

Autonomous Agents and Multi-Agent Systems
Twitinfo: aggregating and visualizing microblogs for event exploration

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
CrowdDB: answering queries with crowdsourcing

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Human-powered sorts and joins

Proceedings of the VLDB Endowment
Pushing the boundaries of crowd-enabled databases with query-driven schema expansion

Proceedings of the VLDB Endowment
Max algorithms in crowdsourcing environments

Proceedings of the 21st international conference on World Wide Web
CrowdScreen: algorithms for filtering data with humans

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
So who won?: dynamic max discovery with the crowd

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Asking the Right Questions in Crowd Data Sourcing

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
CDAS: a crowdsourcing data analytics system

Proceedings of the VLDB Endowment
CrowdER: crowdsourcing entity resolution

Proceedings of the VLDB Endowment

Answering planning queries with the crowd

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Group-by and top-k are fundamental constructs in database queries. However, the criteria used for grouping and ordering certain types of data -- such as unlabeled photos clustered by the same person ordered by age -- are difficult to evaluate by machines. In contrast, these tasks are easy for humans to evaluate and are therefore natural candidates for being crowd-sourced. We study the problem of evaluating top-k and group-by queries using the crowd to answer either type or value questions. Given two data elements, the answer to a type question is "yes" if the elements have the same type and therefore belong to the same group or cluster; the answer to a value question orders the two data elements. The assumption here is that there is an underlying ground truth, but that the answers returned by the crowd may sometimes be erroneous. We formalize the problems of top-k and group-by in the crowd-sourced setting, and give efficient algorithms that are guaranteed to achieve good results with high probability. We analyze the crowd-sourced cost of these algorithms in terms of the total number of type and value questions, and show that they are essentially the best possible. We also show that fewer questions are needed when values and types are correlated, or when the error model is one in which the error decreases as the distance between the two elements in the sorted order increases.