Matrix analysis
Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
An algorithm for a singly constrained class of quadratic programs subject to upper and lower bounds
Mathematical Programming: Series A and B
Adaptive and Self-Confident On-Line Learning Algorithms
COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Convex Optimization
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
A Second-Order Perceptron Algorithm
SIAM Journal on Computing
Smooth minimization of non-smooth functions
Mathematical Programming: Series A and B
Efficient algorithms for online decision problems
Journal of Computer and System Sciences - Special issue: Learning theory 2003
Improved second-order bounds for prediction with expert advice
Machine Learning
Online Passive-Aggressive Algorithms
The Journal of Machine Learning Research
A Discriminative Kernel-Based Approach to Rank Images from Text Queries
IEEE Transactions on Pattern Analysis and Machine Intelligence
Primal-dual subgradient methods for convex problems
Mathematical Programming: Series A and B - Series B - Special Issue: Nonsmooth Optimization and Applications
Robust Stochastic Approximation Approach to Stochastic Programming
SIAM Journal on Optimization
SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent
The Journal of Machine Learning Research
Efficient Online and Batch Learning Using Forward Backward Splitting
The Journal of Machine Learning Research
Logarithmic regret algorithms for online convex optimization
COLT'06 Proceedings of the 19th annual conference on Learning Theory
An optimal method for stochastic composite optimization
Mathematical Programming: Series A and B
On the generalization ability of on-line learning algorithms
IEEE Transactions on Information Theory
Mirror descent and nonlinear projected subgradient methods for convex optimization
Operations Research Letters
Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization
The Journal of Machine Learning Research
Optimization with Sparsity-Inducing Penalties
Foundations and Trends® in Machine Learning
Learning from evolving data streams: online triage of bug reports
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Online learning with multiple kernels: A review
Neural Computation
Adaptive regularization of weight vectors
Machine Learning
Constrained stochastic gradient descent for large-scale least squares problem
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Taxonomy discovery for personalized recommendation
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods outperform state-of-the-art, yet non-adaptive, subgradient algorithms.