Hybrid classifiers based on semantic data subspaces for two-level text categorization

  • Authors:
  • Nandita Tripathi;Michael Oakes;Stefan Wermter

  • Affiliations:
  • University of Sunderland, Sunderland, UK;University of Sunderland, Sunderland, UK;University of Hamburg, Hamburg, Germany

  • Venue:
  • International Journal of Hybrid Intelligent Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many organizations are nowadays keeping their data in the form of multi-level categories for easier manageability. An example of this is the Reuters Corpus which has news items categorized in a hierarchy of up to five levels. The volume and diversity of documents available in such category hierarchies is also increasing daily. As such, it becomes difficult for a traditional classifier to efficiently handle multi-level categorization of such a varied document space. In this paper, we present hybrid classifiers involving various two-classifier and four-classifier combinations for two-level text categorization. We show that the classification accuracy of the hybrid combination is better than the classification accuracies of all the corresponding single classifiers. The constituent classifiers of the hybrid combination operate on different subspaces obtained by semantic separation of data. Our experiments show that dividing a document space into different semantic subspaces increases the efficiency of such hybrid classifier combinations. We further show that hierarchies with a larger number of categories at the first level benefit more from this general hybrid architecture.