Correlation-based relevancy and redundancy measures for efficient gene selection

  • Authors:
  • Kezhi Z. Mao;Wenyin Tang

  • Affiliations:
  • School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore;School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore

  • Venue:
  • PRIB'07 Proceedings of the 2nd IAPR international conference on Pattern recognition in bioinformatics
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The gene-label correlation provides an effective measure of the relevancy of a gene. However, this measure evaluates genes on an individual basis, and the gene sets thus obtained may exhibit severe redundancy. In this study, we propose a new correlation heuristic for set-based gene selection, with the goal of alleviating the redundancy problem. The new correlation heuristic consists of two components that account for gene relevancy and redundancy respectively. The relevancy of a gene is evaluated in terms of its correlation with class label on an individual basis, while the redundancy of a gene with respect to a given gene subset is measured by its correlation with a new dimension built upon the gene subset. The new correlation heuristic retains the simplicity of individual gene evaluation and the redundancy handling capacity of set-based gene evaluation. Two different ways of using the relevancy and redundancy measures are presented in this study. One way is the maximization of the ratio of relevancy measure to redundancy measure, and another way is the maximization of the relevancy measure subtracting redundancy measure. Experimental studies on six gene expression problems show that both criteria produce excellent results.