Word clouds for efficient document labeling

  • Authors:
  • Christin Seifert;Eva Ulbrich;Michael Granitzer

  • Affiliations:
  • University of Technology Graz, Austria;Know-Center, Graz, Austria;University of Technology Graz and Know-Center, Graz, Austria

  • Venue:
  • DS'11 Proceedings of the 14th international conference on Discovery science
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In text classification the amount and quality of training data is crucial for the performance of the classifier. The generation of training data is done by human labelers - a tedious and time-consuming work. We propose to use condensed representations of text documents instead of the full-text document to reduce the labeling time for single documents. These condensed representations are key sentences and key phrases and can be generated in a fully unsupervised way. The key phrases are presented in a layout similar to a tag cloud. In a user study with 37 participants we evaluated whether document labeling with these condensed representations can be done faster and equally accurate by the human labelers. Our evaluation shows that the users labeled word clouds twice as fast but as accurately as full-text documents. While further investigations for different classification tasks are necessary, this insight could potentially reduce costs for the labeling process of text documents.