Sparse approximation through boosting for learning large scale kernel machines

Authors:
Ping Sun;Xin Yao
Affiliations:
Birmingham Health and Wellbeing Partnership, Edgbaston, Birmingham, UK and Centre of Excellence for Research in Computational Intelligence and Applications, University of Birmingham, Edgbaston, Bi ...;Centre of Excellence for Research in Computational Intelligence and Applications, University of Birmingham, Edgbaston, Birmingham, UK
Venue:
IEEE Transactions on Neural Networks
Year:
2010

Citing 40
Cited 8

A resource-allocating network for function interpolation

Neural Computation
Sparse Approximate Solutions to Linear Systems

SIAM Journal on Computing
Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Least Squares Support Vector Machine Classifiers

Neural Processing Letters
Proximal support vector machine classifiers

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Kernel Matching Pursuit

Machine Learning
Reduced Rank Kernel Ridge Regression

Neural Processing Letters
Sparse on-line Gaussian processes

Neural Computation
A parallel mixture of SVMs for very large scale problems

Neural Computation
SMO algorithm for least-squares SVM formulations

Neural Computation
Ridge Regression Learning Algorithm in Dual Variables

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Sparse Greedy Matrix Approximation for Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Constructing Descriptive and Discriminative Nonlinear Features: Rayleigh Coefficients in Kernel Feature Spaces

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient svm training using low-rank kernel representations

The Journal of Machine Learning Research
Some greedy learning algorithms for sparse regression and classification with mercer kernels

The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
In Defense of One-Vs-All Classification

The Journal of Machine Learning Research
Fast SVM Training Algorithm with Decomposition on Very Large Data Sets

IEEE Transactions on Pattern Analysis and Machine Intelligence
Core Vector Machines: Fast SVM Training on Very Large Data Sets

The Journal of Machine Learning Research
Predictive low-rank decomposition for kernel methods

ICML '05 Proceedings of the 22nd international conference on Machine learning
A Fast Dual Algorithm for Kernel Logistic Regression

Machine Learning
Boosting Kernel Models for Regression

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Kernel least-squares models using updates of the pseudoinverse

Neural Computation
Training a Support Vector Machine in the Primal

Neural Computation
A Direct Method for Building Sparse Kernel Learning Algorithms

The Journal of Machine Learning Research
Building Support Vector Machines with Reduced Classifier Complexity

The Journal of Machine Learning Research
Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

The Journal of Machine Learning Research
Comments on the "Core Vector Machines: Fast SVM Training on Very Large Data Sets"

The Journal of Machine Learning Research
Large-scale RLSC learning without agony

Proceedings of the 24th international conference on Machine learning
An orthogonal forward regression technique for sparse kernel density estimation

Neurocomputing
Evidence Contrary to the Statistical View of Boosting

The Journal of Machine Learning Research
Kernel conjugate gradient for fast kernel machines

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Sequential learning with LS-SVM for large-scale data sets

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part II
The kernel recursive least-squares algorithm

IEEE Transactions on Signal Processing
An introduction to kernel-based learning algorithms

IEEE Transactions on Neural Networks
Fast Sparse Approximation for Least Squares Support Vector Machine

IEEE Transactions on Neural Networks
Continuously Differentiable Sample-Spacing Entropy Estimation

IEEE Transactions on Neural Networks

RAMOBoost: ranked minority oversampling in boosting

IEEE Transactions on Neural Networks
Semisupervised kernel matrix learning by kernel propagation

IEEE Transactions on Neural Networks
Condensed vector machines: learning fast machine for large data

IEEE Transactions on Neural Networks
A fast method of feature extraction for kernel MSE

Neurocomputing
Pruning least objective contribution in KMSE

Neurocomputing
Nonlinear single layer neural network training algorithm for incremental, nonstationary and distributed learning scenarios

Pattern Recognition
Regularized vector field learning with sparse approximation for mismatch removal

Pattern Recognition
Training sparse SVM on the core sets of fitting-planes

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, sparse approximation has become a preferred method for learning large scale kernel machines. This technique attempts to represent the solution with only a subset of original data points also known as basis vectors, which are usually chosen one by one with a forward selection procedure based on some selection criteria. The computational complexity of several resultant algorithms scales as O(NM2)in time and O(NM) in memory, where is N the number of training points and M is the number of basis vectors as well as the steps of forward selection. For some large scale data sets, to obtain a better solution, we are sometimes required to include more basis vectors, which means that M is not trivial in this situation. However, the limited computational resource (e.g., memory) prevents us from including too many vectors. To handle this dilemma, we propose to add an ensemble of basis vectors instead of only one at each forward step. The proposed method, closely related to gradient boosting, could decrease the required number M of forward steps significantly and thus a large fraction of computational cost is saved. Numerical experiments on three large scale regression tasks and a classification problem demonstrate the effectiveness of the proposed approach.