Using Modified CHI Square and Rough Set for Text Categorization with Many Redundant Features

  • Authors:
  • Liuling Dai;Jinwu Hu;WanChun Liu

  • Affiliations:
  • -;-;-

  • Venue:
  • ISCID '08 Proceedings of the 2008 International Symposium on Computational Intelligence and Design - Volume 01
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text categorization is a key problem of text mining. Although there are many researchs on this problem, the main works are focused on classification of big categories. There are very few researchs on text categorization problems characterised by many redundant features. We call this kind of problem as fine-text-categorization. In this paper, we presented an algorithm based on modified CHI square feature selection and rough set to solve this problem. The features of categories are selected in a aggressive maner. The classification rules are extracted by using rough set theory. Experiments on real world corpora show that our algorithm can evidently improve classification precision, thus is promising.