Floating search methods in feature selection
Pattern Recognition Letters
On the use of MDL principle in gene expression prediction
EURASIP Journal on Applied Signal Processing - Nonlinear signal and image processing - part I
Signal Processing - Special issue: Genomic signal processing
Markov blanket-embedded genetic algorithm for gene selection
Pattern Recognition
A parameterless feature ranking algorithm based on MI
Neurocomputing
Importance degree of features and feature selection
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Pattern Recognition
Improved feature selection algorithm based on SVM and correlation
ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part I
Hi-index | 0.01 |
Measuring the strength of dependence between two sets of random variables lies at the heart of many statistical problems, in particular, feature selection for pattern recognition. We believe that there are some basic desirable criteria for a measure of dependence not satisfied by many commonly employed measures, such as the correlation coefficient, Briefly stated, a measure of dependence should: (1) be model-free and invariant under monotone transformations of the marginals; (2) fully differentiate different levels of dependence; (3) be applicable to both continuous and categorical distributions; (4) should not have the dependence of X on Y be necessarily the same as the dependence of Y on X; (5) be readily estimated from data; and (6) be straightforwardly extended to multivariate distributions. The new measure of dependence introduced in this paper, called the coefficient of intrinsic dependence(CID), satisfies these criteria. The main motivating idea is that Y is strongly (weakly, resp.) dependent on X if and only if the conditional distribution of Y given X is significantly (mildly, resp.) different from the marginal distribution of Y. We measure the difference by the normalized integrated square difference distance so that the full range of dependence can be adequately reflected in the interval [0, 1]. The paper treats estimation of the CID, provides simulations and comparisons, and applies the CID to gene prediction and cancer classification based on gene-expression measurements from microarrays.