An unsupervised hierarchical approach to document categorization

  • Authors:
  • Robert Wetzker;Tansu Alpcan;Christian Bauckhage;Winfried Umbrath;Sahin Albayrak

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a hierarchical approach to document categorization that requires no pre-configuration and maps the semantic document space to a predefined taxonomy. The utilization of search engines to train a hierarchical classifier makes our approach more flexible than existing solutions which rely on (human) labeled data and are bound to a specific domain. We show that the structural information given by the taxonomy allows for a context aware construction of search queries and leads to higher tagging accuracy. We test our approach on different benchmark datasets and evaluate its performance on the single- and multi-tag assignment tasks. The experimental results show that our solution is as accurate as supervised classifiers for web page classification and still performs well when categorizing domain specific documents.