SIAM Review
Journal of Optimization Theory and Applications
A D. C. Optimization Algorithm for Solving the Trust-Region Subproblem
SIAM Journal on Optimization
Neural Computation
Sparse bayesian learning and the relevance vector machine
The Journal of Machine Learning Research
Use of the zero norm with linear models and kernel methods
The Journal of Machine Learning Research
Expectation-maximization for sparse and non-negative PCA
Proceedings of the 25th international conference on Machine learning
A least squares formulation for canonical correlation analysis
Proceedings of the 25th international conference on Machine learning
Optimal Solutions for Sparse Principal Component Analysis
The Journal of Machine Learning Research
Sparse unsupervised dimensionality reduction algorithms
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Improve robustness of sparse PCA by L1-norm maximization
Pattern Recognition
Sparse PCA by iterative elimination algorithm
Advances in Computational Mathematics
A DC programming approach for solving the symmetric Eigenvalue Complementarity Problem
Computational Optimization and Applications
Feature selection from high-order tensorial data via sparse decomposition
Pattern Recognition Letters
Binary classification via spherical separator by DC programming and DCA
Journal of Global Optimization
Hi-index | 0.00 |
Eigenvalue problems are rampant in machine learning and statistics and appear in the context of classification, dimensionality reduction, etc. In this paper, we consider a cardinality constrained variational formulation of generalized eigenvalue problem with sparse principal component analysis (PCA) as a special case. Using l1-norm approximation to the cardinality constraint, previous methods have proposed both convex and non-convex solutions to the sparse PCA problem. In contrast, we propose a tighter approximation that is related to the negative log-likelihood of a Student's t-distribution. The problem is then framed as a d.c. (difference of convex functions) program and is solved as a sequence of locally convex programs. We show that the proposed method not only explains more variance with sparse loadings on the principal directions but also has better scalability compared to other methods. We demonstrate these results on a collection of datasets of varying dimensionality, two of which are high-dimensional gene datasets where the goal is to find few relevant genes that explain as much variance as possible.