A new feature selection score for multinomial naive Bayes text classification based on KL-divergence

Authors:
Karl-Michael Schneider
Affiliations:
University of Passau, Passau, Germany
Venue:
ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Year:
2004

Citing 5
Cited 5

Elements of information theory

Elements of information theory
Towards language independent automated learning of text categorization models

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Enhanced word clustering for hierarchical text classification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

Extending the single words-based document model: a comparison of bigrams and 2-itemsets

Proceedings of the 2006 ACM symposium on Document engineering
A simple feature-copying approach for long-distance dependencies

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
A comparison of text-classification techniques applied to Arabic text

Journal of the American Society for Information Science and Technology
Random-walk term weighting for improved text classification

TextGraphs-1 Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing
Discrimination-Based feature selection for multinomial naïve bayes text classification

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead

Quantified Score

Hi-index	0.00

Visualization

Abstract

We define a new feature selection score for text classification based on the KL-divergence between the distribution of words in training documents and their classes. The score favors words that have a similar distribution in documents of the same class but different distributions in documents of different classes. Experiments on two standard data sets indicate that the new method outperforms mutual information, especially for smaller categories.