Efficient Online and Batch Learning Using Forward Backward Splitting

Authors:
John Duchi;Yoram Singer
Affiliations:
-;-
Venue:
The Journal of Machine Learning Research
Year:
2009

Citing 15
Cited 23

A Modified Forward-Backward Splitting Method for Maximal Monotone Mappings

SIAM Journal on Control and Optimization
Convergence Rates in Forward--Backward Splitting

SIAM Journal on Optimization
Convex Optimization

Convex Optimization
On Model Selection Consistency of Lasso

The Journal of Machine Learning Research
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression

The Journal of Machine Learning Research
A Discriminative Kernel-Based Approach to Rank Images from Text Queries

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient projections onto the l1-ball for learning in high dimensions

Proceedings of the 25th international conference on Machine learning
A coordinate gradient descent method for nonsmooth separable minimization

Mathematical Programming: Series A and B
An efficient projection for l1, ∞ regularization

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Stochastic methods for l1 regularized loss minimization

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Sparse reconstruction by separable approximation

IEEE Transactions on Signal Processing
Logarithmic regret algorithms for online convex optimization

COLT'06 Proceedings of the 19th annual conference on Learning Theory
De-noising by soft-thresholding

IEEE Transactions on Information Theory
Mirror descent and nonlinear projected subgradient methods for convex optimization

Operations Research Letters

An efficient algorithm for a class of fused lasso problems

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Online learning for multi-task feature selection

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Solving structured sparsity regularization with proximal methods

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization

The Journal of Machine Learning Research
Learning condensed feature representations from large unsupervised data sets for supervised learning

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparsity Regularized Estimation

The Journal of Machine Learning Research
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

The Journal of Machine Learning Research
Proximal Methods for Hierarchical Sparse Coding

The Journal of Machine Learning Research
PADDLE: proximal algorithm for dual dictionaries learning

ICANN'11 Proceedings of the 21th international conference on Artificial neural networks - Volume Part I
Frequency-aware truncated methods for sparse online learning

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Fast Projections onto l1,q-norm balls for grouped feature selection

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Structured sparsity in structured prediction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Optimal distributed online prediction using mini-batches

The Journal of Machine Learning Research
A fast dual projected Newton method for l1-regularized least squares

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Efficient Euclidean projections via Piecewise Root Finding and its application in gradient projection

Neurocomputing
An Inexact Alternating Directions Algorithm for Constrained Total Variation Regularized Compressive Sensing Problems

Journal of Mathematical Imaging and Vision
Manifold identification in dual averaging for regularized stochastic online learning

The Journal of Machine Learning Research
Online feature selection for mining big data

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Splitting and linearizing augmented Lagrangian algorithm for subspace recovery from corrupted observations

Advances in Computational Mathematics
Efficient online learning for multitask feature selection

ACM Transactions on Knowledge Discovery from Data (TKDD)
Community question topic categorization via hierarchical kernelized classification

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Block coordinate descent algorithms for large-scale sparse multiclass classification

Machine Learning
Nonparametric sparsity and regularization

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe, analyze, and experiment with a framework for empirical loss minimization with regularization. Our algorithmic framework alternates between two phases. On each iteration we first perform an unconstrained gradient descent step. We then cast and solve an instantaneous optimization problem that trades off minimization of a regularization term while keeping close proximity to the result of the first phase. This view yields a simple yet effective algorithm that can be used for batch penalized risk minimization and online learning. Furthermore, the two phase approach enables sparse solutions when used in conjunction with regularization functions that promote sparsity, such as l1. We derive concrete and very simple algorithms for minimization of loss functions with l1, l2, l22, and l∞ regularization. We also show how to construct efficient algorithms for mixed-norm l1/lq regularization. We further extend the algorithms and give efficient implementations for very high-dimensional data with sparsity. We demonstrate the potential of the proposed framework in a series of experiments with synthetic and natural data sets.