The nature of statistical learning theory
The nature of statistical learning theory
Machine Learning
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Exact Penalization of Mathematical Programs with Equilibrium Constraints
SIAM Journal on Control and Optimization
Mathematical Programs with Complementarity Constraints: Stationarity, Optimality, and Sensitivity
Mathematics of Operations Research
On the Global Convergence of a Filter--SQP Algorithm
SIAM Journal on Optimization
SIAM Journal on Optimization
Smooth SQP Methods for Mathematical Programs with Nonlinear Complementarity Constraints
SIAM Journal on Optimization
Choosing Multiple Parameters for Support Vector Machines
Machine Learning
Dimensionality reduction via sparse support vector machines
The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
The Entire Regularization Path for the Support Vector Machine
The Journal of Machine Learning Research
Local Convergence of SQP Methods for Mathematical Programs with Equilibrium Constraints
SIAM Journal on Optimization
Two-dimensional solution path for support vector regression
ICML '06 Proceedings of the 23rd international conference on Machine learning
Training linear SVMs in linear time
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
An Efficient Implementation of an Active Set Method for SVMs
The Journal of Machine Learning Research
Fast and efficient strategies for model selection of Gaussian support vector machine
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
On linear programs with linear complementarity constraints
Journal of Global Optimization
Hi-index | 0.00 |
Support vector machines and related classification models require the solution of convex optimization problems that have one or more regularization hyper-parameters. Typically, the hyper-parameters are selected to minimize the cross-validated estimates of the out-of-sample classification error of the model. This cross-validation optimization problem can be formulated as a bilevel program in which the outer-level objective minimizes the average number of misclassified points across the cross-validation folds, subject to inner-level constraints such that the classification functions for each fold are (exactly or nearly) optimal for the selected hyper-parameters. Feature selection is included in the bilevel program in the form of bound constraints in the weights. The resulting bilevel problem is converted to a mathematical program with linear equilibrium constraints, which is solved using state-of-the-art optimization methods. This approach is significantly more versatile than commonly used grid search procedures, enabling, in particular, the use of models with many hyper-parameters. Numerical results demonstrate the practicality of this approach for model selection in machine learning.