The coefficient of intrinsic dependence (feature selection using el CID)

Authors:
Tailen Hsing;Li-Yu Liu; Marcel Brun;Edward R. Dougherty
Affiliations:
Department of Statistics, Texas A&M University, College Station, TX, USA;Department of Statistics, Texas A&M University, College Station, TX, USA;Department of Biochemistry and Molecular Biology, University of Louisville, KY, USA;Department of Electrical Engineering, Texas A&M University, 3128 TAMU, College Station, TX 77843-3128, USA and Department of Pathology, University of Texas M. D. Anderson Cancer Center, Houston, T ...
Venue:
Pattern Recognition
Year:
2005

Citing 4
Cited 5

Floating search methods in feature selection

Pattern Recognition Letters
On the use of MDL principle in gene expression prediction

EURASIP Journal on Applied Signal Processing - Nonlinear signal and image processing - part I
Efficient selection of feature sets possessing high coefficients of determination based on incremental determinations

Signal Processing - Special issue: Genomic signal processing
Is cross-validation better than resubstitution for ranking genes?

Bioinformatics

Markov blanket-embedded genetic algorithm for gene selection

Pattern Recognition
A parameterless feature ranking algorithm based on MI

Neurocomputing
Importance degree of features and feature selection

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
The copula echo state network

Pattern Recognition
Improved feature selection algorithm based on SVM and correlation

ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part I

Quantified Score

Hi-index	0.01

Visualization

Abstract

Measuring the strength of dependence between two sets of random variables lies at the heart of many statistical problems, in particular, feature selection for pattern recognition. We believe that there are some basic desirable criteria for a measure of dependence not satisfied by many commonly employed measures, such as the correlation coefficient, Briefly stated, a measure of dependence should: (1) be model-free and invariant under monotone transformations of the marginals; (2) fully differentiate different levels of dependence; (3) be applicable to both continuous and categorical distributions; (4) should not have the dependence of X on Y be necessarily the same as the dependence of Y on X; (5) be readily estimated from data; and (6) be straightforwardly extended to multivariate distributions. The new measure of dependence introduced in this paper, called the coefficient of intrinsic dependence(CID), satisfies these criteria. The main motivating idea is that Y is strongly (weakly, resp.) dependent on X if and only if the conditional distribution of Y given X is significantly (mildly, resp.) different from the marginal distribution of Y. We measure the difference by the normalized integrated square difference distance so that the full range of dependence can be adequately reflected in the interval [0, 1]. The paper treats estimation of the CID, provides simulations and comparisons, and applies the CID to gene prediction and cancer classification based on gene-expression measurements from microarrays.