Some comments of Wolfe's `away step'
Mathematical Programming: Series A and B
Sparse Approximate Solutions to Linear Systems
SIAM Journal on Computing
Feature selection, L1 vs. L2 regularization, and rotational invariance
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Totally corrective boosting algorithms that maximize the margin
ICML '06 Proceedings of the 23rd international conference on Machine learning
On Model Selection Consistency of Lasso
The Journal of Machine Learning Research
Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Optimization Methods & Software
Efficient agnostic learning of neural networks with bounded fan-in
IEEE Transactions on Information Theory - Part 2
Sequential greedy approximation for certain convex optimization problems
IEEE Transactions on Information Theory
Decoding by linear programming
IEEE Transactions on Information Theory
Stochastic Methods for l1-regularized Loss Minimization
The Journal of Machine Learning Research
Adaptive and optimal online linear regression on l1-balls
ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages
Proceedings of the 22nd international conference on World Wide Web
Greedy sparsity-constrained optimization
The Journal of Machine Learning Research
Feature engineering and tree modeling for author-paper identification challenge
Proceedings of the 2013 KDD Cup 2013 Workshop
Adaptive and optimal online linear regression on ℓ1-balls
Theoretical Computer Science
Hi-index | 0.01 |
We study the problem of minimizing the expected loss of a linear predictor while constraining its sparsity, i.e., bounding the number of features used by the predictor. While the resulting optimization problem is generally NP-hard, several approximation algorithms are considered. We analyze the performance of these algorithms, focusing on the characterization of the trade-off between accuracy and sparsity of the learned predictor in different scenarios.