Optimized Parameter Search for Large Datasets of the Regularization Parameter and Feature Selection for Ridge Regression

Authors:
Pieter Buteneers;Ken Caluwaerts;Joni Dambre;David Verstraeten;Benjamin Schrauwen
Affiliations:
Electronics and Information Systems, Ghent University, Ghent, Belgium 9000;Electronics and Information Systems, Ghent University, Ghent, Belgium 9000;Electronics and Information Systems, Ghent University, Ghent, Belgium 9000;Electronics and Information Systems, Ghent University, Ghent, Belgium 9000;Electronics and Information Systems, Ghent University, Ghent, Belgium 9000
Venue:
Neural Processing Letters
Year:
2013

Citing 12
Cited 0

Numerical recipes in FORTRAN (2nd ed.): the art of scientific computing

Numerical recipes in FORTRAN (2nd ed.): the art of scientific computing
An introduction to variable and feature selection

The Journal of Machine Learning Research
Fast exact leave-one-out cross-validation of sparse least-squares support vector machines

Neural Networks
Additive Regularization Trade-Off: Fusion of Training and Validation Levels in Kernel Methods

Machine Learning
2008 Special Issue: Low rank updated LS-SVM classifiers for fast variable selection

Neural Networks
Deterministic neural classification

Neural Computation
Pruning and regularization in reservoir computing

Neurocomputing
The differogram: Non-parametric noise variance estimation and its use for model selection

Neurocomputing
Optimized fixed-size kernel models for large data sets

Computational Statistics & Data Analysis
OP-ELM: optimally pruned extreme learning machine

IEEE Transactions on Neural Networks
Automatic detection of epileptic seizures on the intra-cranial electroencephalogram of rats using reservoir computing

Artificial Intelligence in Medicine
Survey: Reservoir computing approaches to recurrent neural network training

Computer Science Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose mathematical optimizations to select the optimal regularization parameter for ridge regression using cross-validation. The resulting algorithm is suited for large datasets and the computational cost does not depend on the size of the training set. We extend this algorithm to forward or backward feature selection in which the optimal regularization parameter is selected for each possible feature set. These feature selection algorithms yield solutions with a sparse weight matrix using a quadratic cost on the norm of the weights. A naive approach to optimizing the ridge regression parameter has a computational complexity of the order $$O(R K N^{2} M)$$ with $$R$$ the number of applied regularization parameters, $$K$$ the number of folds in the validation set, $$N$$ the number of input features and $$M$$ the number of data samples in the training set. Our implementation has a computational complexity of the order $$O(KN^3)$$. This computational cost is smaller than that of regression without regularization $$O(N^2M)$$ for large datasets and is independent of the number of applied regularization parameters and the size of the training set. Combined with a feature selection algorithm the algorithm is of complexity $$O(RKNN_s^3)$$ and $$O(RKN^3N_r)$$ for forward and backward feature selection respectively, with $$N_s$$ the number of selected features and $$N_r$$ the number of removed features. This is an order $$M$$ faster than $$O(RKNN_s^3M)$$ and $$O(RKN^3N_rM)$$ for the naive implementation, with $$N \ll M$$ for large datasets. To show the performance and reduction in computational cost, we apply this technique to train recurrent neural networks using the reservoir computing approach, windowed ridge regression, least-squares support vector machines (LS-SVMs) in primal space using the fixed-size LS-SVM approximation and extreme learning machines.