Term-frequency Based Feature Selection Methods for Text Categorization

  • Authors:
  • Yan Xu;Lin Chen

  • Affiliations:
  • -;-

  • Venue:
  • ICGEC '10 Proceedings of the 2010 Fourth International Conference on Genetic and Evolutionary Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

A major difficulty of text categorization is the high dimensionality of the feature space. Feature selection is an important step in text categorization to reduce the feature space. Automatic feature selection methods such as document frequency thresholding (DF), information gain (IG), mutual information (MI), and so on are commonly applied in text categorization, but they do not use term frequency information. In this paper, we put forward improved DF, improved IG and improved MI methods which use term frequency information. Experiments show that our improved methods are seen notable improvements in the performance than the original DF, IG and MI methods.