Gene Selection for Multi-Class Prediction of Microarray Data

  • Authors:
  • Dechang Chen;Dong Hua;Jaques Reifman;Xiuzhen Cheng

  • Affiliations:
  • -;-;-;-

  • Venue:
  • CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Gene expression data from microarrays have been successfullyapplied to class prediction, where the purpose isto classify and predict the diagnostic category of a sampleby its gene expression profile. A typical microarray datasetconsists of expression levels for a large number of geneson a relatively small number of samples. As a consequence,one basic and important question associated with class predictionis: how do we identify a small subset of informativegenes contributing the most to the classification task? Manymethods have been proposed but most focus on two-classproblems, such as discrimination between normal and diseasesamples. This paper addresses selecting informativegenes for multi-class prediction problems by jointly consideringall the classes simultaneously. Our approach is basedon the power of the genes in discriminating among the differentclasses (e.g., tumor types) and the existing correlationbetween genes. We formulate the expression levels ofa given gene by a one-way analysis of variance model withheterogeneity of variances, and determine the discriminatorypower of the gene by a test statistic designed to test theequality of the class means. In other words, the discriminatorypower of a gene is associated with a Behrens-Fisherproblem. Informative genes are chosen such that each selectedgene has a high discriminatory power and the correlationbetween any pair of selected genes is low. Teststatistics considered in this paper include the ANOVA F teststatistic, the Brown-Forsythe test statistic, the Cochran teststatistic, and the Welch test statistic. Their performancesare evaluated over several classifiication methods appliedto two publicly available microarray datasets. The resultsshow that Brown-Forsythe test statistic achieves the bestperformance.