Learning sparse classifiers with difference of convex functions algorithms

  • Authors:
  • ChengSoon Ong;LeThi Hoai An

  • Affiliations:
  • Department of Computer Science, ETH Zurich, Switzerland;Laboratory of Theoretical and Applied Computer Science, University of Paul Verlaine $#8211/ Metz, Ile de Saulcy, 57045, Metz, France

  • Venue:
  • Optimization Methods & Software - the 8th International Conference on Optimization: Techniques and Applications
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sparsity of a classifier is a desirable condition for high-dimensional data and large sample sizes. This paper investigates the two complementary notions of sparsity for binary classification: sparsity in the number of features and sparsity in the number of examples. Several different losses and regularizers are considered: the hinge loss and ramp loss, and ℓ2, ℓ1, approximate ℓ0, and capped ℓ1 regularization. We propose three new objective functions that further promote sparsity, the capped ℓ1 regularization with hinge loss, and the ramp loss versions of approximate ℓ0 and capped ℓ1 regularization. We derive difference of convex functions algorithms DCA for solving these novel non-convex objective functions. The proposed algorithms are shown to converge in a finite number of iterations to a local minimum. Using simulated data and several data sets from the University of California Irvine UCI machine learning repository, we empirically investigate the fraction of features and examples required by the different classifiers.