Crowd synthesis: extracting categories and clusters from complex data

Authors:
Paul André;Aniket Kittur;Steven P. Dow
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA
Venue:
Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing
Year:
2014

Citing 19
Cited 0

The cost structure of sensemaking

INTERCHI '93 Proceedings of the INTERCHI '93 conference on Human factors in computing systems
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Sorting things out: classification and its consequences

Sorting things out: classification and its consequences
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Latent dirichlet allocation

The Journal of Machine Learning Research
Labeling images with a computer game

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Do categories have politics? the language/action perspective reconsidered

ECSCW'93 Proceedings of the third conference on European Conference on Computer-Supported Cooperative Work
Solving Non-Uniqueness in Agglomerative Hierarchical Clustering Using Multidendrograms

Journal of Classification
Articulations of wikiwork: uncovering valued work in wikipedia through barnstars

Proceedings of the 2008 ACM conference on Computer supported cooperative work
Rethinking the ESP game

CHI '09 Extended Abstracts on Human Factors in Computing Systems
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Guest Editors' Introduction: Studying Professional Software Design

IEEE Software
Who gives a tweet?: evaluating microblog content value

Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work
Strategies for crowdsourcing social data analysis

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Interpretation and trust: designing model-driven visualizations for text analysis

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Semantic interaction for visual text analytics

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Termite: visualization techniques for assessing textual topic models

Proceedings of the International Working Conference on Advanced Visual Interfaces
Cascade: crowdsourcing taxonomy creation

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Gender, topic, and audience response: an analysis of user-generated content on facebook

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Analysts synthesize complex, qualitative data to uncover themes and concepts, but the process is time-consuming, cognitively taxing, and automated techniques show mixed success. Crowdsourcing could help this process through on-demand harnessing of flexible and powerful human cognition, but incurs other challenges including limited attention and expertise. Further, text data can be complex, high-dimensional, and ill-structured. We address two major challenges unsolved in prior crowd clustering work: scaffolding expertise for novice crowd workers, and creating consistent and accurate categories when each worker only sees a small portion of the data. To address these challenges we present an empirical study of a two-stage approach to enable crowds to create an accurate and useful overview of a dataset: A) we draw on cognitive theory to assess how re-representing data can shorten and focus the data on salient dimensions; and B) introduce an iterative clustering approach that provides workers a global overview of data. We demonstrate a classification-plus-context approach elicits the most accurate categories at the most useful level of abstraction.