A new feature selection score for multinomial naive Bayes text classification based on KL-divergence

  • Authors:
  • Karl-Michael Schneider

  • Affiliations:
  • University of Passau, Passau, Germany

  • Venue:
  • ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We define a new feature selection score for text classification based on the KL-divergence between the distribution of words in training documents and their classes. The score favors words that have a similar distribution in documents of the same class but different distributions in documents of different classes. Experiments on two standard data sets indicate that the new method outperforms mutual information, especially for smaller categories.