Floating search methods in feature selection
Pattern Recognition Letters
Divergence Based Feature Selection for Multimodal Class Densities
IEEE Transactions on Pattern Analysis and Machine Intelligence
Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Feature Selection in Web Applications By ROC Inflections and Powerset Pruning
SAINT '01 Proceedings of the 2001 Symposium on Applications and the Internet (SAINT 2001)
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Patterns in pattern recognition: 1968-1974
IEEE Transactions on Information Theory
Hi-index | 0.10 |
A frequent practice in feature selection is to maximize the Kullback-Leibler (K-L) distance between target classes. In this note we show that this common custom is frequently suboptimal, since it fails to take into account the fact that classification occurs using a finite number of samples. In classification, the variance and higher order moments of the likelihood function should be taken into account to select feature subsets, and the Kullback-Leibler distance only relates to the mean separation. We derive appropriate expressions and show that these can lead to major increases in performance.