The nature of statistical learning theory
The nature of statistical learning theory
A view of the EM algorithm that justifies incremental, sparse, and other variants
Learning in graphical models
Efficient SVM Regression Training with SMO
Machine Learning
Ridge Regression Learning Algorithm in Dual Variables
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Sparse bayesian learning and the relevance vector machine
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Use of the zero norm with linear models and kernel methods
The Journal of Machine Learning Research
The Minimum Error Minimax Probability Machine
The Journal of Machine Learning Research
An efficient method for simplifying support vector machines
ICML '05 Proceedings of the 22nd international conference on Machine learning
Improvements to Platt's SMO Algorithm for SVM Classifier Design
Neural Computation
Input space versus feature space in kernel-based methods
IEEE Transactions on Neural Networks
Maxi–Min Margin Machine: Learning Large Margin Classifiers Locally and Globally
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
Support vector machines (SVM) are state-of-the-art classifiers. Typically L2-norm or L1-norm is adopted as a regularization term in SVMs, while other norm-based SVMs, for example, the L0-norm SVM or even the L∞-norm SVM, are rarely seen in the literature. The major reason is that L0-norm describes a discontinuous and nonconvex term, leading to a combinatorially NP-hard optimization problem. In this letter, motivated by Bayesian learning, we propose a novel framework that can implement arbitrary norm-based SVMs in polynomial time. One significant feature of this framework is that only a sequence of sequential minimal optimization problems needs to be solved, thus making it practical in many real applications. The proposed framework is important in the sense that Bayesian priors can be efficiently plugged into most learning methods without knowing the explicit form. Hence, this builds a connection between Bayesian learning and the kernel machines. We derive the theoretical framework, demonstrate how our approach works on the L0-norm SVM as a typical example, and perform a series of experiments to validate its advantages. Experimental results on nine benchmark data sets are very encouraging. The implemented L0-norm is competitive with or even better than the standard L2-norm SVM in terms of accuracy but with a reduced number of support vectors,-9.46% of the number on average. When compared with another sparse model, the relevance vector machine, our proposed algorithm also demonstrates better sparse properties with a training speed over seven times faster.