Word clouds for efficient document labeling

Authors:
Christin Seifert;Eva Ulbrich;Michael Granitzer
Affiliations:
University of Technology Graz, Austria;Know-Center, Graz, Austria;University of Technology Graz and Know-Center, Graz, Austria
Venue:
DS'11 Proceedings of the 14th international conference on Discovery science
Year:
2011

Citing 14
Cited 0

Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Active learning for logistic regression: an evaluation

Machine Learning
Learning from labeled features using generalized expectation criteria

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
On the Beauty and Usability of Tag Clouds

IV '08 Proceedings of the 2008 12th International Conference Information Visualisation
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
The Word Tree, an Interactive Visual Concordance

IEEE Transactions on Visualization and Computer Graphics
A class-feature-centroid classifier for text categorization

Proceedings of the 18th international conference on World wide web
A web survey on the use of active learning to support annotation of text data

HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
Document Cards: A Top Trumps Visualization for Documents

IEEE Transactions on Visualization and Computer Graphics
Mapping Text with Phrase Nets

IEEE Transactions on Visualization and Computer Graphics
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
How well does active learning actually work?: Time-based evaluation of cost-reduction strategies for language documentation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Visualization of text streams: a survey

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

In text classification the amount and quality of training data is crucial for the performance of the classifier. The generation of training data is done by human labelers - a tedious and time-consuming work. We propose to use condensed representations of text documents instead of the full-text document to reduce the labeling time for single documents. These condensed representations are key sentences and key phrases and can be generated in a fully unsupervised way. The key phrases are presented in a layout similar to a tag cloud. In a user study with 37 participants we evaluated whether document labeling with these condensed representations can be done faster and equally accurate by the human labelers. Our evaluation shows that the users labeled word clouds twice as fast but as accurately as full-text documents. While further investigations for different classification tasks are necessary, this insight could potentially reduce costs for the labeling process of text documents.