Elements of information theory
Elements of information theory
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
On the Performance Assessment and Comparison of Stochastic Multiobjective Optimizers
PPSN IV Proceedings of the 4th International Conference on Parallel Problem Solving from Nature
Estimation of entropy and mutual information
Neural Computation
Object Recognition with Informative Features and Linear Classification
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Learning Bayesian network classifiers by maximizing conditional likelihood
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Efficient Feature Selection via Analysis of Relevance and Redundancy
The Journal of Machine Learning Research
Large-Sample Learning of Bayesian Networks is NP-Hard
The Journal of Machine Learning Research
Fast Binary Feature Selection with Conditional Mutual Information
The Journal of Machine Learning Research
IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature selection and feature extraction for text categorization
HLT '91 Proceedings of the workshop on Speech and Natural Language
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
Stability of feature selection algorithms: a study on high-dimensional spaces
Knowledge and Information Systems
A stability index for feature selection
AIAP'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications
Stable feature selection via dense feature groups
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Stable and Accurate Feature Selection
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Gait feature subset selection by mutual information
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans - Special section: Best papers from the 2007 biometrics: Theory, applications, and systems (BTAS 07) conference
On the Feature Selection Criterion Based on an Approximation of Multidimensional Mutual Information
IEEE Transactions on Pattern Analysis and Machine Intelligence
Maximum Likelihood in Cost-Sensitive Learning: Model Specification, Approximations, and Upper Bounds
The Journal of Machine Learning Research
Conditional infomax learning: an integrated framework for feature extraction and fusion
ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
On the use of variable complementarity for feature selection in cancer classification
EuroGP'06 Proceedings of the 2006 international conference on Applications of Evolutionary Computing
Input feature selection for classification problems
IEEE Transactions on Neural Networks
Efficient feature selection filters for high-dimensional data
Pattern Recognition Letters
Information-theoretic selection of high-dimensional spectral features for structural recognition
Computer Vision and Image Understanding
The Journal of Machine Learning Research
Feature Interaction Maximisation
Pattern Recognition Letters
Control-flow integrity principles, implementations, and applications
ACM Transactions on Information and System Security (TISSEC)
A real-time transportation prediction system
Applied Intelligence
Hi-index | 0.00 |
We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: "what are the implicit statistical assumptions of feature selection criteria based on mutual information?". To answer this, we adopt a different strategy than is usual in the feature selection literature--instead of trying to define a criterion, we derive one, directly from a clearly specified objective function: the conditional likelihood of the training labels. While many hand-designed heuristic criteria try to optimize a definition of feature 'relevancy' and 'redundancy', our approach leads to a probabilistic framework which naturally incorporates these concepts. As a result we can unify the numerous criteria published over the last two decades, and show them to be low-order approximations to the exact (but intractable) optimisation problem. The primary contribution is to show that common heuristics for information based feature selection (including Markov Blanket algorithms as a special case) are approximate iterative maximisers of the conditional likelihood. A large empirical study provides strong evidence to favour certain classes of criteria, in particular those that balance the relative size of the relevancy/redundancy terms. Overall we conclude that the JMI criterion (Yang and Moody, 1999; Meyer et al., 2008) provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples.