Feature selection for text classification with Naïve Bayes

  • Authors:
  • Jingnian Chen;Houkuan Huang;Shengfeng Tian;Youli Qu

  • Affiliations:
  • School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China and Department of Information and Computing Science, Shandong University of Finance, Jinan, Shando ...;School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China;School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China;School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2009

Quantified Score

Hi-index 12.06

Visualization

Abstract

As an important preprocessing technology in text classification, feature selection can improve the scalability, efficiency and accuracy of a text classifier. In general, a good feature selection method should consider domain and algorithm characteristics. As the Naive Bayesian classifier is very simple and efficient and highly sensitive to feature selection, so the research of feature selection specially for it is significant. This paper presents two feature evaluation metrics for the Naive Bayesian classifier applied on multi-class text datasets: Multi-class Odds Ratio (MOR), and Class Discriminating Measure (CDM). Experiments of text classification with Naive Bayesian classifiers were carried out on two multi-class texts collections. As the results indicate, CDM and MOR gain obviously better selecting effect than other feature selection approaches.