Gene and sample selection for cancer classification with support vectors based t-statistic

  • Authors:
  • Piyushkumar A. Mundra;Jagath C. Rajapakse

  • Affiliations:
  • Bioinformatics Research Center, School of Computer Engineering, Nanyang Technological University, Singapore;Bioinformatics Research Center, School of Computer Engineering, Nanyang Technological University, Singapore and Singapore-MIT Alliance, Singapore and Department of Biological Engineering, Massachu ...

  • Venue:
  • Neurocomputing
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

T-statistic is widely used for gene ranking in the analysis of microarray gene expressions. Such a filter based criterion is generally computed using all the training samples, all of which, however, may not be equally important for classification task. In this paper, we decompose the t-statistic into two parts, corresponding to relevant and irrelevant data points. The relevant data points are selected using support vectors and then used to compute t-statistic for feature selection. By simultaneously selecting data points and genes, significantly better classification results are achieved on synthetic as well as on several benchmark cancer datasets.