Classification model selection via bilevel programming

Authors:
G. Kunapuli;K. P. Bennett;Jing Hu;Jong-Shi Pang
Affiliations:
Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY, USA;Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY, USA;Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY, USA;Department of Industrial and Enterprise Systems Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Venue:
Optimization Methods & Software - Mathematical programming in data mining and machine learning
Year:
2008

Citing 19
Cited 2

The nature of statistical learning theory

The nature of statistical learning theory
Support-Vector Networks

Machine Learning
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Exact Penalization of Mathematical Programs with Equilibrium Constraints

SIAM Journal on Control and Optimization
Mathematical Programs with Complementarity Constraints: Stationarity, Optimality, and Sensitivity

Mathematics of Operations Research
On the Global Convergence of a Filter--SQP Algorithm

SIAM Journal on Optimization
Convergence Properties of a Regularization Scheme for Mathematical Programs with Complementarity Constraints

SIAM Journal on Optimization
An Implementable Active-Set Algorithm for Computing a B-Stationary Point of a Mathematical Program with Linear Complementarity Constraints

SIAM Journal on Optimization
Smooth SQP Methods for Mathematical Programs with Nonlinear Complementarity Constraints

SIAM Journal on Optimization
Choosing Multiple Parameters for Support Vector Machines

Machine Learning
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
Dimensionality reduction via sparse support vector machines

The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
The Entire Regularization Path for the Support Vector Machine

The Journal of Machine Learning Research
On Using the Elastic Mode in Nonlinear Programming Approaches to Mathematical Programs with Complementarity Constraints

SIAM Journal on Optimization
Local Convergence of SQP Methods for Mathematical Programs with Equilibrium Constraints

SIAM Journal on Optimization
Two-dimensional solution path for support vector regression

ICML '06 Proceedings of the 23rd international conference on Machine learning
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
An Efficient Implementation of an Active Set Method for SVMs

The Journal of Machine Learning Research

Fast and efficient strategies for model selection of Gaussian support vector machine

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
On linear programs with linear complementarity constraints

Journal of Global Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Support vector machines and related classification models require the solution of convex optimization problems that have one or more regularization hyper-parameters. Typically, the hyper-parameters are selected to minimize the cross-validated estimates of the out-of-sample classification error of the model. This cross-validation optimization problem can be formulated as a bilevel program in which the outer-level objective minimizes the average number of misclassified points across the cross-validation folds, subject to inner-level constraints such that the classification functions for each fold are (exactly or nearly) optimal for the selected hyper-parameters. Feature selection is included in the bilevel program in the form of bound constraints in the weights. The resulting bilevel problem is converted to a mathematical program with linear equilibrium constraints, which is solved using state-of-the-art optimization methods. This approach is significantly more versatile than commonly used grid search procedures, enabling, in particular, the use of models with many hyper-parameters. Numerical results demonstrate the practicality of this approach for model selection in machine learning.