A class core extraction method for text categorization

  • Authors:
  • Shicai Yu;Jianxing Zhang

  • Affiliations:
  • School of Computer Science and Communication, Lanzhou University of Technology, Lanzhou, China;School of Computer Science and Communication, Lanzhou University of Technology, Lanzhou, China

  • Venue:
  • FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text categorization is an important research field within text mining. A document, actually, is often full of class-independent "general" words which many documents and classes share. These "general" words do harm to text categorization rather than contribute to the task. Inspired by human cognitive procedure in text classification task, we propose a novel approach called Class Core Extraction (CCE) method to extract "core" terms from each class. The "core" terms, which include not only the single-words but also the combinations of words just like a simple description of context, must be those terms with strong distinguishing power. In testing phase, a suitable algorithm what we called "lottery" algorithm is also proposed, which use weighted matching strategy to make final categorization decision. The comparative experiment on two datasets shows that the accuracy of our approach outperforms the k-nearest-neighbor (kNN) based classifier, as well as outstanding efficiency compare with the Support Vector Machine (SVM) based classifier.