Elements of information theory
Elements of information theory
Towards language independent automated learning of text categorization models
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Enhanced word clustering for hierarchical text classification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Extending the single words-based document model: a comparison of bigrams and 2-itemsets
Proceedings of the 2006 ACM symposium on Document engineering
A simple feature-copying approach for long-distance dependencies
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
A comparison of text-classification techniques applied to Arabic text
Journal of the American Society for Information Science and Technology
Random-walk term weighting for improved text classification
TextGraphs-1 Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing
Discrimination-Based feature selection for multinomial naïve bayes text classification
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Hi-index | 0.00 |
We define a new feature selection score for text classification based on the KL-divergence between the distribution of words in training documents and their classes. The score favors words that have a similar distribution in documents of the same class but different distributions in documents of different classes. Experiments on two standard data sets indicate that the new method outperforms mutual information, especially for smaller categories.