Text classification using small number of features

  • Authors:
  • Masoud Makrehchi;Mohamed S. Kamel

  • Affiliations:
  • Pattern Analysis and Machine Intelligence Lab, Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario, Canada;Pattern Analysis and Machine Intelligence Lab, Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario, Canada

  • Venue:
  • MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Feature selection method for text classification based on information gain ranking, improved by removing redundant terms using mutual information measure and inclusion index, is proposed. We report an experiment to study the impact of term redundancy on the performance of text classifier. The result shows that term redundancy behaves very similar to noise and may degrade the classifier performance. The proposed method is tested on an SVM text classifier. Feature reduction by this method remarkably outperforms information gain based feature selection.