A Bayesian hybrid Huberized support vector machine and its applications in high-dimensional medical data

  • Authors:
  • Sounak Chakraborty;Ruixin Guo

  • Affiliations:
  • Department of Statistics, University of Missouri-Columbia, 209F Middlebush Hall, Columbia, MO 65211-6100, USA;Department of Biostatistics, University of North Carolina at Chapel Hill, 3101 McGavran-Greenberg, CB 7420, Chapel Hill, NC 27599, USA

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2011

Quantified Score

Hi-index 0.03

Visualization

Abstract

A hybrid Huberized support vector machine (HHSVM) with an elastic-net penalty has been developed for cancer tumor classification based on thousands of gene expression measurements. In this paper, we develop a Bayesian formulation of the hybrid Huberized support vector machine for binary classification. For the coefficients of the linear classification boundary, we propose a new type of prior, which can select variables and group them together simultaneously. Our proposed prior is a scale mixture of normal distributions and independent gamma priors on a transformation of the variance of the normal distributions. We establish a direct connection between the Bayesian HHSVM model with our special prior and the standard HHSVM solution with the elastic-net penalty. We propose a hierarchical Bayes technique and an empirical Bayes technique to select the penalty parameter. In the hierarchical Bayes model, the penalty parameter is selected using a beta prior. For the empirical Bayes model, we estimate the penalty parameter by maximizing the marginal likelihood. The proposed model is applied to two simulated data sets and three real-life gene expression microarray data sets. Results suggest that our Bayesian models are highly successful in selecting groups of similarly behaved important genes and predicting the cancer class. Most of the genes selected by our models have shown strong association with well-studied genetic pathways, further validating our claims.