Bayesian multiple response kernel regression model for high dimensional data and its practical applications in near infrared spectroscopy

  • Authors:
  • Sounak Chakraborty

  • Affiliations:
  • -

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2012

Quantified Score

Hi-index 0.03

Visualization

Abstract

Non-linear regression based on reproducing kernel Hilbert space (RKHS) has recently become very popular in fitting high-dimensional data. The RKHS formulation provides an automatic dimension reduction of the covariates. This is particularly helpful when the number of covariates (p) far exceed the number of data points. In this paper, we introduce a Bayesian nonlinear multivariate regression model for high-dimensional problems. Our model is suitable when we have multiple correlated observed response corresponding to same set of covariates. We introduce a robust Bayesian support vector regression model based on a multivariate version of Vapnik's @e-insensitive loss function. The likelihood corresponding to the multivariate Vapnik's @e-insensitive loss function is constructed as a scale mixture of truncated normal and gamma distribution. The regression function is constructed using the finite representation of a function in the reproducing kernel Hilbert space (RKHS). The kernel parameter is estimated adaptively by assigning a prior on it and using the Markov chain Monte Carlo (MCMC) techniques for computation. Practical applications of our model are demonstrated via applications in near-infrared (NIR) spectroscopy and simulation studies. Our Bayesian kernel models are highly accurate in predicting composition of materials based on its near infrared (NIR) spectroscopy signature. We have compared our method with popularly used methodologies in NIR spectroscopy, like partial least square (PLS), principal component regression (PCA), support vector machine (SVM), Gaussian process regression (GPR), and random forest (RF). In all the simulation and real case studies, our multivariate Bayesian RKHS regression model outperforms the standard methods by a substantially large margin. The implementation of our models based on MCMC is fairly fast and straight forward.