Classification model selection via bilevel programming

  • Authors:
  • G. Kunapuli;K. P. Bennett;Jing Hu;Jong-Shi Pang

  • Affiliations:
  • Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY, USA;Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY, USA;Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY, USA;Department of Industrial and Enterprise Systems Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA

  • Venue:
  • Optimization Methods & Software - Mathematical programming in data mining and machine learning
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Support vector machines and related classification models require the solution of convex optimization problems that have one or more regularization hyper-parameters. Typically, the hyper-parameters are selected to minimize the cross-validated estimates of the out-of-sample classification error of the model. This cross-validation optimization problem can be formulated as a bilevel program in which the outer-level objective minimizes the average number of misclassified points across the cross-validation folds, subject to inner-level constraints such that the classification functions for each fold are (exactly or nearly) optimal for the selected hyper-parameters. Feature selection is included in the bilevel program in the form of bound constraints in the weights. The resulting bilevel problem is converted to a mathematical program with linear equilibrium constraints, which is solved using state-of-the-art optimization methods. This approach is significantly more versatile than commonly used grid search procedures, enabling, in particular, the use of models with many hyper-parameters. Numerical results demonstrate the practicality of this approach for model selection in machine learning.