2005 Special Issue: Bayesian approach to feature selection and parameter tuning for support vector machine classifiers

Authors:
Carl Gold;Alex Holub;Peter Sollich
Affiliations:
Computation and Neural Systems, California Institute of Technology, 139-74, Pasadena, CA 91125, USA;Computation and Neural Systems, California Institute of Technology, 139-74, Pasadena, CA 91125, USA;Department of Mathematics, King's College London, Strand, London WC2R 2LS, UK
Venue:
Neural Networks - 2005 Special issue: IJCNN 2005
Year:
2005

Citing 10
Cited 14

Solving the multiple instance problem with axis-parallel rectangles

Artificial Intelligence
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Soft Margins for AdaBoost

Machine Learning
Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities

Machine Learning
Choosing Multiple Parameters for Support Vector Machines

Machine Learning
Bayesian trigonometric support vector classifier

Neural Computation
Efficient svm training using low-rank kernel representations

The Journal of Machine Learning Research
Gaussian Processes for Classification: Mean-Field Algorithms

Neural Computation
Knowledge discovery approach to automated cardiac SPECT diagnosis

Artificial Intelligence in Medicine
The evidence framework applied to support vector machines

IEEE Transactions on Neural Networks

Hybrid Generative-Discriminative Visual Categorization

International Journal of Computer Vision
Kernel discriminant analysis based feature selection

Neurocomputing
Particle swarm optimization for parameter determination and feature selection of support vector machines

Expert Systems with Applications: An International Journal
Parameter determination of support vector machine and feature selection using simulated annealing approach

Applied Soft Computing
Credit scoring algorithm based on link analysis ranking with support vector machine

Expert Systems with Applications: An International Journal
Analysis of the distance between two classes for tuning SVM hyperparameters

IEEE Transactions on Neural Networks
Feature selection for SVM via optimization of kernel polarization with Gaussian ARD kernels

Expert Systems with Applications: An International Journal
A systematic comparison of metamodeling techniques for simulation optimization in Decision Support Systems

Applied Soft Computing
Enhancing the classification accuracy by scatter-search-based ensemble approach

Applied Soft Computing
On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation

The Journal of Machine Learning Research
A Biomedical Decision Support System Using LS-SVM Classifier with an Efficient and New Parameter Regularization Procedure for Diagnosis of Heart Valve Diseases

Journal of Medical Systems
Spectral and spatial feature classification of hyperspectral images based on particle swarm optimisation

International Journal of Innovative Computing and Applications
Mean field variational Bayesian inference for support vector machine classification

Computational Statistics & Data Analysis
Feature selection and multi-kernel learning for sparse representation on a manifold

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

A Bayesian point of view of SVM classifiers allows the definition of a quantity analogous to the evidence in probabilistic models. By maximizing this one can systematically tune hyperparameters and, via automatic relevance determination (ARD), select relevant input features. Evidence gradients are expressed as averages over the associated posterior and can be approximated using Hybrid Monte Carlo (HMC) sampling. We describe how a Nystrom approximation of the Gram matrix can be used to speed up sampling times significantly while maintaining almost unchanged classification accuracy. In experiments on classification problems with a significant number of irrelevant features this approach to ARD can give a significant improvement in classification performance over more traditional, non-ARD, SVM systems. The final tuned hyperparameter values provide a useful criterion for pruning irrelevant features, and we define a measure of relevance with which to determine systematically how many features should be removed. This use of ARD for hard feature selection can improve classification accuracy in non-ARD SVMs. In the majority of cases, however, we find that in data sets constructed by human domain experts the performance of non-ARD SVMs is largely insensitive to the presence of some less relevant features. Eliminating such features via ARD then does not improve classification accuracy, but leads to impressive reductions in the number of features required, by up to 75%.