An effective filtering gene selection method for microarray data via shuffling and statistical analysis

Authors:
Zejin Jason Ding;Yan-Qing Zhang
Affiliations:
Georgia State University, Atlanta, GA;Georgia State University, Atlanta, GA
Venue:
Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Year:
2010

Citing 5
Cited 0

Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
Cluster Analysis for Gene Expression Data: A Survey

IEEE Transactions on Knowledge and Data Engineering
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A review of feature selection techniques in bioinformatics

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Correlation-based filtering gene selection methods have been shown to be quite effective for microarray data analysis, and hundreds of methods have been proposed in literature. In this paper, we extend the correlation of between genes and sample statues in a broader way where the relation between a gene vector and the label vector is particularly unique such that the relation cannot be replicated by randomly shuffling the gene expression values or sample status data. A two-layer of statistical analysis is performed on the original microarrays and label-shuffled data to identify the important gene markers. We design a simple metric---the difference of signal-to-noise between positive and negative classes---that doesn't work well for directly selecting top informative genes (verifying with linear SVM classifier); however, after collecting and ranking the second-level significance values of every gene on the original and many shuffled microarray data, the top selected genes have shown much better classification performance. Results on several public microarray data have shown genes selected by our method could also lead to high leave-one-out prediction accuracy.