Correlation-based relevancy and redundancy measures for efficient gene selection

Authors:
Kezhi Z. Mao;Wenyin Tang
Affiliations:
School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore;School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore
Venue:
PRIB'07 Proceedings of the 2nd IAPR international conference on Pattern recognition in bioinformatics
Year:
2007

Citing 10
Cited 0

Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Minimum Redundancy Feature Selection from Microarray Gene Expression Data

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Efficient Feature Selection via Analysis of Relevance and Redundancy

The Journal of Machine Learning Research
Is cross-validation valid for small-sample microarray classification?

Bioinformatics
LS Bound based gene selection for DNA microarray data

Bioinformatics
A semiparametric approach for marker gene selection based on gene expression data

Bioinformatics
Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data

Bioinformatics
Analysis of recursive gene selection approaches from microarray data

Bioinformatics
Gene selection using support vector machines with non-convex penalty

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The gene-label correlation provides an effective measure of the relevancy of a gene. However, this measure evaluates genes on an individual basis, and the gene sets thus obtained may exhibit severe redundancy. In this study, we propose a new correlation heuristic for set-based gene selection, with the goal of alleviating the redundancy problem. The new correlation heuristic consists of two components that account for gene relevancy and redundancy respectively. The relevancy of a gene is evaluated in terms of its correlation with class label on an individual basis, while the redundancy of a gene with respect to a given gene subset is measured by its correlation with a new dimension built upon the gene subset. The new correlation heuristic retains the simplicity of individual gene evaluation and the redundancy handling capacity of set-based gene evaluation. Two different ways of using the relevancy and redundancy measures are presented in this study. One way is the maximization of the ratio of relevancy measure to redundancy measure, and another way is the maximization of the relevancy measure subtracting redundancy measure. Experimental studies on six gene expression problems show that both criteria produce excellent results.