Robust feature selection by mutual information distributions

Authors:
Marco Zaffalon;Marcus Hutter
Affiliations:
IDSIA, Manno, Switzerland;IDSIA, Manno, Switzerland
Venue:
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Year:
2002

Citing 14
Cited 14

Statistical analysis with missing data

Statistical analysis with missing data
Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Elements of information theory

Elements of information theory
C4.5: programs for machine learning

C4.5: programs for machine learning
Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
A tutorial on learning with Bayesian networks

Learning in graphical models
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
KDD Cup 2001 report

ACM SIGKDD Explorations Newsletter
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
A Guide to the Literature on Learning Probabilistic Networks from Data

IEEE Transactions on Knowledge and Data Engineering
Distribution of mutual information for robust feature selection

Distribution of mutual information for robust feature selection
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Feature selection and feature extraction for text categorization

HLT '91 Proceedings of the workshop on Speech and Natural Language

Information-theoretic wavelet packet subband selection for texture classification

Signal Processing
Cancer classification using Rotation Forest

Computers in Biology and Medicine
A New Approach of Feature Selection for Chinese Web Page Categorization

ISICA '08 Proceedings of the 3rd International Symposium on Advances in Computation and Intelligence
Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection

Expert Systems with Applications: An International Journal
Microarray data classification based on ensemble independent component selection

Computers in Biology and Medicine
Real-Time Collaborative Filtering Using Extreme Learning Machine

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
New frameworks to boost feature selection algorithms in emotion detection for improved human-computer interaction

BVAI'07 Proceedings of the 2nd international conference on Advances in brain, vision and artificial intelligence
The design of evolutionary multiple classifier system for the classification of microarray data

ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part III
Automatic window design for gray-scale image processing based on entropy minimization

CIARP'05 Proceedings of the 10th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis and Applications
Feature selection for microarray data analysis using mutual information and rough set theory

ICIC'06 Proceedings of the 2006 international conference on Computational Intelligence and Bioinformatics - Volume Part III
Feature selection with missing data using mutual information estimators

Neurocomputing
Multimedia features for click prediction of new ads in display advertising

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Nearest neighbor estimate of conditional mutual information in feature selection

Expert Systems with Applications: An International Journal
Feature selection via dependence maximization

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mutual information is widely used in artificial intelligence, in a descriptive way, to measure the stochastic dependence of discrete random variables. In order to address questions such as the reliability of the empirical value, one must consider sample-to-population inferential approaches. This paper deals with the distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean and an analytical approximation of the variance are reported. Asymptotic approximations of the distribution are proposed. The results are applied to the problem of selecting features for incremental learning and classification of the naive Bayes classifier. A fast, newly defined method is shown to outperform the traditional approach based on empirical mutual information on a number of real data sets. Finally, a theoretical development is reported that allows one to efficiently extend the above methods to incomplete samples in an easy and effective way.