Theoretical and empirical study on the potential inadequacy of mutual information for feature selection in classification

Authors:
BenoíT FréNay;Gauthier Doquire;Michel Verleysen
Affiliations:
Machine Learning Group-ICTEAM, Université catholique de Louvain, Place du Levant 3, 1348 Louvain-la-Neuve, Belgium;Machine Learning Group-ICTEAM, Université catholique de Louvain, Place du Levant 3, 1348 Louvain-la-Neuve, Belgium;Machine Learning Group-ICTEAM, Université catholique de Louvain, Place du Levant 3, 1348 Louvain-la-Neuve, Belgium
Venue:
Neurocomputing
Year:
2013

Citing 9
Cited 1

Elements of information theory

Elements of information theory
An introduction to variable and feature selection

The Journal of Machine Learning Research
Fast Binary Feature Selection with Conditional Mutual Information

The Journal of Machine Learning Research
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
Spectral feature projections that maximize Shannon mutual information with class labels

Pattern Recognition
Resampling methods for parameter-free and robust feature selection with mutual information

Neurocomputing
An Information Theoretic Perspective on Multiple Classifier Systems

MCS '09 Proceedings of the 8th International Workshop on Multiple Classifier Systems
Probability of error, equivocation, and the Chernoff bound

IEEE Transactions on Information Theory
Using mutual information for selecting features in supervised neural net learning

IEEE Transactions on Neural Networks

Neural networks letter: Is mutual information adequate for feature selection in regression?

Neural Networks

Quantified Score

Hi-index	0.01

Visualization

Abstract

Mutual information is a widely used performance criterion for filter feature selection. However, despite its popularity and its appealing properties, mutual information is not always the most appropriate criterion. Indeed, contrary to what is sometimes hypothesized in the literature, looking for a feature subset maximizing the mutual information does not always guarantee to decrease the misclassification probability, which is often the objective one is interested in. The first objective of this paper is thus to clearly illustrate this potential inadequacy and to emphasize the fact that the mutual information remains a heuristic, coming with no guarantee in terms of classification accuracy. Through extensive experiments, a deeper analysis of the cases for which the mutual information is not a suitable criterion is then conducted. This analysis allows us to confirm the general interest of the mutual information for feature selection. It also helps us better apprehending the behaviour of mutual information throughout a feature selection process and consequently making a better use of it as a feature selection criterion.