Sparse conjugate directions pursuit with application to fixed-size kernel models

  • Authors:
  • Peter Karsmakers;Kristiaan Pelckmans;Kris Brabanter;Hugo Hamme;Johan A. Suykens

  • Affiliations:
  • Department of Electrical Engineering, K.U.Leuven, Heverlee, Belgium 3001 and Department IBW, K.H.Kempen (Association K.U.Leuven), Geel, Belgium 2440;Division of Systems and Control, Department of Information Technology, Uppsala University, Uppsala, Sweden 751 05;Department of Electrical Engineering, K.U.Leuven, Heverlee, Belgium 3001;Department of Electrical Engineering, K.U.Leuven, Heverlee, Belgium 3001;Department of Electrical Engineering, K.U.Leuven, Heverlee, Belgium 3001

  • Venue:
  • Machine Learning
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This work studies an optimization scheme for computing sparse approximate solutions of over-determined linear systems. Sparse Conjugate Directions Pursuit (SCDP) aims to construct a solution using only a small number of nonzero (i.e. nonsparse) coefficients. Motivations of this work can be found in a setting of machine learning where sparse models typically exhibit better generalization performance, lead to fast evaluations, and might be exploited to define scalable algorithms. The main idea is to build up iteratively a conjugate set of vectors of increasing cardinality, in each iteration solving a small linear subsystem. By exploiting the structure of this conjugate basis, an algorithm is found (i) converging in at most D iterations for D-dimensional systems, (ii) with computational complexity close to the classical conjugate gradient algorithm, and (iii) which is especially efficient when a few iterations suffice to produce a good approximation. As an example, the application of SCDP to Fixed-Size Least Squares Support Vector Machines (FS-LSSVM) is discussed resulting in a scheme which efficiently finds a good model size for the FS-LSSVM setting, and is scalable to large-scale machine learning tasks. The algorithm is empirically verified in a classification context. Further discussion includes algorithmic issues such as component selection criteria, computational analysis, influence of additional hyper-parameters, and determination of a suitable stopping criterion.