A multi-classifier system for text categorization

  • Authors:
  • Shubhamoy Dey

  • Affiliations:
  • Indian Institute of Management, Prabandh Shikhar, Rau, Indore, India

  • Venue:
  • Proceedings of the 2011 ACM Symposium on Research in Applied Computation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text categorization, the assignment of text documents to one or more pre-defined categories, is one of the most intensely researched text mining tasks. The task may be subdivided into two main parts: the representation of the text documents by some form of a numerical vector space, and the application of a suitable supervised learning technique. This research is focused on the second part of the problem. The work presented in this paper proposes the construction of a classification model for each of the (pre-defined) categories or themes present in a corpus using a term-frequency based 'keyword' identification and document scoring technique. The documents misclassified by each of these (category-specific) classifier models are then re-classified with the help of the other models. The effectiveness of the approach is demonstrated by experiments on two publicly available BBC News corpuses. Good classification accuracy is observed for each of the two corpuses. Specifically, the macro-averaged and micro-averaged F-measures of the proposed method (on evaluation the dataset) for the BBC Sports corpus are 94.7% and 94.3% respectively.