CrowdScreen: algorithms for filtering data with humans

Authors:
Aditya G. Parameswaran;Hector Garcia-Molina;Hyunjung Park;Neoklis Polyzotis;Aditya Ramesh;Jennifer Widom
Affiliations:
Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA;UC Santa Cruz, Santa Cruz, CA, USA;Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA
Venue:
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Year:
2012

Citing 16
Cited 14

Get another label? improving data quality and data mining using multiple, noisy labelers

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Crowdsourcing for relevance evaluation

ACM SIGIR Forum
Matching Schemas in Online Communities: A Web 2.0 Approach

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Supervised learning from multiple experts: whom to trust when everyone lies a bit

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Efficiently learning the accuracy of labeling sources for selective sampling

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
TurKit: tools for iterative tasks on mechanical Turk

Proceedings of the ACM SIGKDD Workshop on Human Computation
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Feasibility of human-in-the-loop minimum error rate training

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Algorithmic game theory

Communications of the ACM
Crowdsourcing systems on the World-Wide Web

Communications of the ACM
Everyone's an influencer: quantifying influence on twitter

Proceedings of the fourth ACM international conference on Web search and data mining
Human-assisted graph search: it's okay to ask questions

Proceedings of the VLDB Endowment
Human computation: a survey and taxonomy of a growing field

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
CrowdDB: answering queries with crowdsourcing

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Demonstration of Qurk: a query processor for humanoperators

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Human-powered sorts and joins

Proceedings of the VLDB Endowment

Deco: declarative crowdsourcing

Proceedings of the 21st ACM international conference on Information and knowledge management
Using the crowd for top-k and group-by queries

Proceedings of the 16th International Conference on Database Theory
CrowdSeed: query processing on microblogs

Proceedings of the 16th International Conference on Extending Database Technology
Leveraging transitive relations for crowdsourced joins

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
An online cost sensitive decision-making method in crowdsourcing systems

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Towards a generic framework for trustworthy spatial crowdsourcing

Proceedings of the 12th International ACM Workshop on Data Engineering for Wireless and Mobile Acess
Evaluating the crowd with confidence

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Debiasing social wisdom

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Optimizing plurality for human intelligence tasks

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
A human-machine method for web table understanding

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Mobility and social networking: a data management perspective

Proceedings of the VLDB Endowment
Answering planning queries with the crowd

Proceedings of the VLDB Endowment
Maximum Complex Task Assignment: Towards Tasks Correlation in Spatial Crowdsourcing

Proceedings of International Conference on Information Integration and Web-based Applications & Services
Learning an accurate entity resolution model from crowdsourced labels

Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a large set of data items, we consider the problem of filtering them based on a set of properties that can be verified by humans. This problem is commonplace in crowdsourcing applications, and yet, to our knowledge, no one has considered the formal optimization of this problem. (Typical solutions use heuristics to solve the problem.) We formally state a few different variants of this problem. We develop deterministic and probabilistic algorithms to optimize the expected cost (i.e., number of questions) and expected error. We experimentally show that our algorithms provide definite gains with respect to other strategies. Our algorithms can be applied in a variety of crowdsourcing scenarios and can form an integral part of any query processor that uses human computation.