Optimized Parameter Search for Large Datasets of the Regularization Parameter and Feature Selection for Ridge Regression

  • Authors:
  • Pieter Buteneers;Ken Caluwaerts;Joni Dambre;David Verstraeten;Benjamin Schrauwen

  • Affiliations:
  • Electronics and Information Systems, Ghent University, Ghent, Belgium 9000;Electronics and Information Systems, Ghent University, Ghent, Belgium 9000;Electronics and Information Systems, Ghent University, Ghent, Belgium 9000;Electronics and Information Systems, Ghent University, Ghent, Belgium 9000;Electronics and Information Systems, Ghent University, Ghent, Belgium 9000

  • Venue:
  • Neural Processing Letters
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose mathematical optimizations to select the optimal regularization parameter for ridge regression using cross-validation. The resulting algorithm is suited for large datasets and the computational cost does not depend on the size of the training set. We extend this algorithm to forward or backward feature selection in which the optimal regularization parameter is selected for each possible feature set. These feature selection algorithms yield solutions with a sparse weight matrix using a quadratic cost on the norm of the weights. A naive approach to optimizing the ridge regression parameter has a computational complexity of the order $$O(R K N^{2} M)$$ with $$R$$ the number of applied regularization parameters, $$K$$ the number of folds in the validation set, $$N$$ the number of input features and $$M$$ the number of data samples in the training set. Our implementation has a computational complexity of the order $$O(KN^3)$$. This computational cost is smaller than that of regression without regularization $$O(N^2M)$$ for large datasets and is independent of the number of applied regularization parameters and the size of the training set. Combined with a feature selection algorithm the algorithm is of complexity $$O(RKNN_s^3)$$ and $$O(RKN^3N_r)$$ for forward and backward feature selection respectively, with $$N_s$$ the number of selected features and $$N_r$$ the number of removed features. This is an order $$M$$ faster than $$O(RKNN_s^3M)$$ and $$O(RKN^3N_rM)$$ for the naive implementation, with $$N \ll M$$ for large datasets. To show the performance and reduction in computational cost, we apply this technique to train recurrent neural networks using the reservoir computing approach, windowed ridge regression, least-squares support vector machines (LS-SVMs) in primal space using the fixed-size LS-SVM approximation and extreme learning machines.