Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis

Authors:
Yuchun Tang;Yan-Qing Zhang;Zhen Huang
Affiliations:
-;-;-
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2007

Citing 14
Cited 26

Optimal brain damage

Advances in neural information processing systems 2
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Gene functional classification from heterogeneous data

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Principles of data mining

Principles of data mining
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
Use of the zero norm with linear models and kernel methods

The Journal of Machine Learning Research
Microarray data mining: facing the challenges

ACM SIGKDD Explorations Newsletter
Machine learning methods applied to DNA microarray data can improve the diagnosis of cancer

ACM SIGKDD Explorations Newsletter
Cluster Analysis for Gene Expression Data: A Survey

IEEE Transactions on Knowledge and Data Engineering
Is cross-validation valid for small-sample microarray classification?

Bioinformatics
Ontological analysis of gene expression data: current tools, limitations, and open problems

Bioinformatics
Analysis of recursive gene selection approaches from microarray data

Bioinformatics
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition

Biological pathways as features for microarray data classification

Proceedings of the 2nd international workshop on Data and text mining in bioinformatics
Gene boosting for cancer classification based on gene expression profiles

Pattern Recognition
A neural network-based biomarker association information extraction approach for cancer classification

Journal of Biomedical Informatics
A Neural Approach for SME's Credit Risk Analysis in Turkey

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Incremental Bayesian Network Learning for Scalable Feature Selection

IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Tumor clustering using nonnegative matrix factorization with gene selection

IEEE Transactions on Information Technology in Biomedicine - Special section on biomedical informatics
Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps

Artificial Intelligence in Medicine
An effective filtering gene selection method for microarray data via shuffling and statistical analysis

Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Recursive Mahalanobis Separability Measure for Gene Subset Selection

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
ICGA-PSO-ELM Approach for Accurate Multiclass Cancer Classification Resulting in Reduced Gene Sets in Which Genes Encoding Secreted Proteins Are Highly Represented

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
SVM-RFE based feature selection for tandem mass spectrum quality assessment

International Journal of Data Mining and Bioinformatics
Knowledge discovery using neural approach for SME's credit risk analysis problem in Turkey

Expert Systems with Applications: An International Journal
Robust Feature Selection for Microarray Data Based on Multicriterion Fusion

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Multiple-kernel SVM based multiple-task oriented data mining system for gene expression data analysis

Expert Systems with Applications: An International Journal
Feature selection for support vector machines with RBF kernel

Artificial Intelligence Review
Sparse and stable gene selection with consensus SVM-RFE

Pattern Recognition Letters
Stable Gene Selection from Microarray Data via Sample Weighting

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A modified two-stage SVM-RFE model for cancer classification using microarray data

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part I
Gene Selection Using Iterative Feature Elimination Random Forests for Survival Outcomes

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A novel multi-stage feature selection method for microarray expression data analysis

International Journal of Data Mining and Bioinformatics
Sparse maximum margin discriminant analysis for feature extraction and gene selection on gene expression data

Computers in Biology and Medicine
Simultaneous sample and gene selection using t-score and approximate support vectors

PRIB'13 Proceedings of the 8th IAPR international conference on Pattern Recognition in Bioinformatics
Automatic feature selection for named entity recognition using genetic algorithm

Proceedings of the Fourth Symposium on Information and Communication Technology
Robust feature selection based on regularized brownboost loss

Knowledge-Based Systems
Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients

Applied Soft Computing
PLS-based recursive feature elimination for high-dimensional small sample

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Extracting a subset of informative genes from microarray expression data is a critical data preparation step in cancer classification and other biological function analyses. Though many algorithms have been developed, the Support Vector Machine - Recursive Feature Elimination (SVM-RFE) algorithm is one of the best gene feature selection algorithms. It assumes that a smaller "filter-out" factor in the SVM-RFE, which results in a smaller number of gene features eliminated in each recursion, should lead to extraction of a better gene subset. Because the SVM-RFE is highly sensitive to the "filter-out" factor, our simulations have shown that this assumption is not always correct and that the SVM-RFE is an unstable algorithm. To select a set of key gene features for reliable prediction of cancer types or subtypes and other applications, a new two-stage SVM-RFE algorithm has been developed. It is designed to effectively eliminate most of the irrelevant, redundant and noisy genes while keeping information loss small at the first stage. A fine selection for the final gene subset is then performed at the second stage. The two-stage SVM-RFE overcomes the instability problem of the SVM-RFE to achieve better algorithm utility. We have demonstrated that the two-stage SVM-RFE is significantly more accurate and more reliable than the SVM-RFE and three correlation-based methods based on our analysis of three publicly available microarray expression datasets. Furthermore, the two-stage SVM-RFE is computationally efficient because its time complexity is $O(d * \log{_2d})$, where $d$ is the size of the original gene set.