Feature sub-set selection metrics for Arabic text classification

  • Authors:
  • Abdelwadood Moh'd Mesleh

  • Affiliations:
  • Computer Engineering Department, Faculty of Engineering Technology, Al-Blaqa' Applied University, Amman, Jordan

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2011

Quantified Score

Hi-index 0.10

Visualization

Abstract

Feature sub-set selection (FSS) is an important step for effective text classification (TC) systems. This paper presents an empirical comparison of seventeen traditional FSS metrics for TC tasks. The TC is restricted to support vector machine (SVM) classifier and only for Arabic articles. Evaluation used a corpus that consists of 7842 documents independently classified into ten categories. The experimental results are presented in terms of macro-averaging precision, macro-averaging recall and macro-averaging F"1 measures. Results reveal that Chi-square and Fallout FSS metrics work best for Arabic TC tasks.