MMR-based feature selection for text categorization

  • Authors:
  • Changki Lee;Gary Geunbae Lee

  • Affiliations:
  • Pohang University of Science & Technology, Hyoja-Dong, Pohang, South Korea;Pohang University of Science & Technology, Hyoja-Dong, Pohang, South Korea

  • Venue:
  • HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce a new method of feature selection for text categorization. Our MMR-based feature selection method strives to reduce redundancy between features while maintaining information gain in selecting appropriate features for text categorization. Empirical results show that MMR-based feature selection is more effective than Koller & Sahami's method, which is one of greedy feature selection methods, and conventional information gain which is commonly used in feature selection for text categorization. Moreover, MMR-based feature selection sometimes produces some improvements of conventional machine learning algorithms over SVM which is known to give the best classification accuracy.