Solving the multiple instance problem with axis-parallel rectangles
Artificial Intelligence
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Machine Learning
Choosing Multiple Parameters for Support Vector Machines
Machine Learning
Bayesian trigonometric support vector classifier
Neural Computation
Efficient svm training using low-rank kernel representations
The Journal of Machine Learning Research
Gaussian Processes for Classification: Mean-Field Algorithms
Neural Computation
Knowledge discovery approach to automated cardiac SPECT diagnosis
Artificial Intelligence in Medicine
The evidence framework applied to support vector machines
IEEE Transactions on Neural Networks
Hybrid Generative-Discriminative Visual Categorization
International Journal of Computer Vision
Kernel discriminant analysis based feature selection
Neurocomputing
Expert Systems with Applications: An International Journal
Credit scoring algorithm based on link analysis ranking with support vector machine
Expert Systems with Applications: An International Journal
Analysis of the distance between two classes for tuning SVM hyperparameters
IEEE Transactions on Neural Networks
Feature selection for SVM via optimization of kernel polarization with Gaussian ARD kernels
Expert Systems with Applications: An International Journal
Enhancing the classification accuracy by scatter-search-based ensemble approach
Applied Soft Computing
On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation
The Journal of Machine Learning Research
International Journal of Innovative Computing and Applications
Mean field variational Bayesian inference for support vector machine classification
Computational Statistics & Data Analysis
Hi-index | 0.00 |
A Bayesian point of view of SVM classifiers allows the definition of a quantity analogous to the evidence in probabilistic models. By maximizing this one can systematically tune hyperparameters and, via automatic relevance determination (ARD), select relevant input features. Evidence gradients are expressed as averages over the associated posterior and can be approximated using Hybrid Monte Carlo (HMC) sampling. We describe how a Nystrom approximation of the Gram matrix can be used to speed up sampling times significantly while maintaining almost unchanged classification accuracy. In experiments on classification problems with a significant number of irrelevant features this approach to ARD can give a significant improvement in classification performance over more traditional, non-ARD, SVM systems. The final tuned hyperparameter values provide a useful criterion for pruning irrelevant features, and we define a measure of relevance with which to determine systematically how many features should be removed. This use of ARD for hard feature selection can improve classification accuracy in non-ARD SVMs. In the majority of cases, however, we find that in data sets constructed by human domain experts the performance of non-ARD SVMs is largely insensitive to the presence of some less relevant features. Eliminating such features via ARD then does not improve classification accuracy, but leads to impressive reductions in the number of features required, by up to 75%.