Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Minimum Redundancy Feature Selection from Microarray Gene Expression Data
CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Efficient Feature Selection via Analysis of Relevance and Redundancy
The Journal of Machine Learning Research
LS Bound based gene selection for DNA microarray data
Bioinformatics
Hi-index | 0.00 |
The gene-label correlation provides an effective measure of the relevancy of a gene. However, this measure evaluates genes on an individual basis, and the gene sets thus obtained may exhibit severe redundancy. In this study, we propose a new correlation heuristic for set-based gene selection, with the goal of alleviating the redundancy problem. The new correlation heuristic consists of two components that account for gene relevancy and redundancy respectively. The relevancy of a gene is evaluated in terms of its correlation with class label on an individual basis, while the redundancy of a gene with respect to a given gene subset is measured by its correlation with a new dimension built upon the gene subset. The new correlation heuristic retains the simplicity of individual gene evaluation and the redundancy handling capacity of set-based gene evaluation. Two different ways of using the relevancy and redundancy measures are presented in this study. One way is the maximization of the ratio of relevancy measure to redundancy measure, and another way is the maximization of the relevancy measure subtracting redundancy measure. Experimental studies on six gene expression problems show that both criteria produce excellent results.