Cluster Analysis for Gene Expression Data: A Survey
IEEE Transactions on Knowledge and Data Engineering
Toward Integrating Feature Selection Algorithms for Classification and Clustering
IEEE Transactions on Knowledge and Data Engineering
Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A review of feature selection techniques in bioinformatics
Bioinformatics
Hi-index | 0.00 |
Correlation-based filtering gene selection methods have been shown to be quite effective for microarray data analysis, and hundreds of methods have been proposed in literature. In this paper, we extend the correlation of between genes and sample statues in a broader way where the relation between a gene vector and the label vector is particularly unique such that the relation cannot be replicated by randomly shuffling the gene expression values or sample status data. A two-layer of statistical analysis is performed on the original microarrays and label-shuffled data to identify the important gene markers. We design a simple metric---the difference of signal-to-noise between positive and negative classes---that doesn't work well for directly selecting top informative genes (verifying with linear SVM classifier); however, after collecting and ranking the second-level significance values of every gene on the original and many shuffled microarray data, the top selected genes have shown much better classification performance. Results on several public microarray data have shown genes selected by our method could also lead to high leave-one-out prediction accuracy.