A novel multi-stage feature selection method for microarray expression data analysis
International Journal of Data Mining and Bioinformatics
Hi-index | 0.00 |
Selecting the most informative cancer-related genes from huge microarray gene expression data is an important and challenging bioinformatics research topic. This paper presents the novel Granular Support Vector Machines - Recursive Feature Elimination (GSVM-RFE) algorithm for the gene selection task. As a biologically meaningful hybrid method of statistical learning theory and granular computing theory, GSVM-RFE can separately eliminate irrelevant, redundant or noisy genes in different granules at different stages and can select positively related genes and negatively related genes in balance. Simulation results on the prostate cancer dataset show that GSVM-RFE is statistically much more accurate than traditional algorithms for the prostate cancer classification. More importantly, GSVM-RFE extracts a compact "perfect" gene subset of 17 genes with 100% accuracy. To our best knowledge, this is the first time such a "perfect" gene subset is reported, which is expected to be helpful for prostate cancer study.