An effective filtering gene selection method for microarray data via shuffling and statistical analysis

  • Authors:
  • Zejin Jason Ding;Yan-Qing Zhang

  • Affiliations:
  • Georgia State University, Atlanta, GA;Georgia State University, Atlanta, GA

  • Venue:
  • Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Correlation-based filtering gene selection methods have been shown to be quite effective for microarray data analysis, and hundreds of methods have been proposed in literature. In this paper, we extend the correlation of between genes and sample statues in a broader way where the relation between a gene vector and the label vector is particularly unique such that the relation cannot be replicated by randomly shuffling the gene expression values or sample status data. A two-layer of statistical analysis is performed on the original microarrays and label-shuffled data to identify the important gene markers. We design a simple metric---the difference of signal-to-noise between positive and negative classes---that doesn't work well for directly selecting top informative genes (verifying with linear SVM classifier); however, after collecting and ranking the second-level significance values of every gene on the original and many shuffled microarray data, the top selected genes have shown much better classification performance. Results on several public microarray data have shown genes selected by our method could also lead to high leave-one-out prediction accuracy.