Gene selection using support vector machines with non-convex penalty

Authors:
Hao Helen Zhang;Jeongyoun Ahn;Xiaodong Lin;Cheolwoo Park
Affiliations:
Department of Statistics, North Carolina State University Raleigh, NC 27695, USA;Department of Statistics and Operations Research, University of North Carolina Chapel Hill, NC 27599, USA;Department of Mathematical Sciences, University of Cincinnati OH 45221, USA;Department of Statistics, University of Georgia Athens, GA 30602, USA
Venue:
Bioinformatics
Year:
2006

Citing 0
Cited 25

Cancer classification by gradient LDA technique using microarray gene expression data

Data & Knowledge Engineering
Gene Selection for Cancer Classification Using DCA

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Gene Expression Data Classification Using Independent Variable Group Analysis

ISNN '08 Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks, Part II
Biological pathways as features for microarray data classification

Proceedings of the 2nd international workshop on Data and text mining in bioinformatics
Selecting marker genes for cancer classification using supervised weighted kernel clustering and the support vector machine

Computational Statistics & Data Analysis
Evaluating switching neural networks through artificial and real gene expression data

Artificial Intelligence in Medicine
Feature selection via Boolean independent component analysis

Information Sciences: an International Journal
Sparse Support Vector Machines with L_{p} Penalty for Biomarker Identification

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Correlation-based relevancy and redundancy measures for efficient gene selection

PRIB'07 Proceedings of the 2nd IAPR international conference on Pattern recognition in bioinformatics
Variable selection via combined penalization for high-dimensional data analysis

Computational Statistics & Data Analysis
Feature selection in the Laplacian support vector machine

Computational Statistics & Data Analysis
Gene expression data classification using locally linear discriminant embedding

Computers in Biology and Medicine
A Bayesian hybrid Huberized support vector machine and its applications in high-dimensional medical data

Computational Statistics & Data Analysis
Recursive Mahalanobis Separability Measure for Gene Subset Selection

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Gene selection and prediction for cancer classification using support vector machines with a reject option

Computational Statistics & Data Analysis
Review: Supervised classification and mathematical optimization

Computers and Operations Research
Support Vector Machines with L1 penalty for detecting gene-gene interactions

International Journal of Data Mining and Bioinformatics
Gene selection for cancer tumor detection using a novel memetic algorithm with a multi-view fitness function

Engineering Applications of Artificial Intelligence
Sparse maximum margin discriminant analysis for feature extraction and gene selection on gene expression data

Computers in Biology and Medicine
Sparse high-dimensional fractional-norm support vector machine via DC programming

Computational Statistics & Data Analysis
A fast algorithm for kernel 1-norm support vector machines

Knowledge-Based Systems
Analysis of programming properties and the row-column generation method for 1-norm support vector machines

Neural Networks
Efficient feature size reduction via predictive forward selection

Pattern Recognition
Comparing the learning effectiveness of BP, ELM, I-ELM, and SVM for corporate credit ratings

Neurocomputing
Selection of genes mediating certain cancers, using a neuro-fuzzy approach

Neurocomputing

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: With the development of DNA microarray technology, scientists can now measure the expression levels of thousands of genes simultaneously in one single experiment. One current difficulty in interpreting microarray data comes from their innate nature of 'high-dimensional low sample size'. Therefore, robust and accurate gene selection methods are required to identify differentially expressed group of genes across different samples, e.g. between cancerous and normal cells. Successful gene selection will help to classify different cancer types, lead to a better understanding of genetic signatures in cancers and improve treatment strategies. Although gene selection and cancer classification are two closely related problems, most existing approaches handle them separately by selecting genes prior to classification. We provide a unified procedure for simultaneous gene selection and cancer classification, achieving high accuracy in both aspects. Results: In this paper we develop a novel type of regularization in support vector machines (SVMs) to identify important genes for cancer classification. A special nonconvex penalty, called the smoothly clipped absolute deviation penalty, is imposed on the hinge loss function in the SVM. By systematically thresholding small estimates to zeros, the new procedure eliminates redundant genes automatically and yields a compact and accurate classifier. A successive quadratic algorithm is proposed to convert the non-differentiable and non-convex optimization problem into easily solved linear equation systems. The method is applied to two real datasets and has produced very promising results. Availability: MATLAB codes are available upon request from the authors. Contact: hzhang@stat.ncsu.edu Supplementary information: http://www4.stat.ncsu.edu/~hzhang/research.html