Sparse Online Learning via Truncated Gradient

Authors:
John Langford;Lihong Li;Tong Zhang
Affiliations:
-;-;-
Venue:
The Journal of Machine Learning Research
Year:
2009

Citing 0
Cited 33

Brain state decoding for rapid image retrieval

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
A large-scale active learning system for topical categorization on the web

Proceedings of the 19th international conference on World wide web
Large linear classification when data cannot fit in memory

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Combined regression and ranking

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Training and Testing Low-degree Polynomial Data Mappings via Linear SVM

The Journal of Machine Learning Research
Online learning for multi-task feature selection

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Online knowledge-based support vector machines

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Efficient and numerically stable sparse learning

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization

The Journal of Machine Learning Research
A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification

The Journal of Machine Learning Research
Automatic acquisition of huge training data for bio-medical named entity recognition

BioNLP '11 Proceedings of BioNLP 2011 Workshop
Detecting adversarial advertisements in the wild

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Selective block minimization for faster convergence of limited memory large-scale linear models

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparsity Regularized Estimation

The Journal of Machine Learning Research
Frequency-aware truncated methods for sparse online learning

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Cross-Lingual Adaptation Using Structural Correspondence Learning

ACM Transactions on Intelligent Systems and Technology (TIST)
Large Linear Classification When Data Cannot Fit in Memory

ACM Transactions on Knowledge Discovery from Data (TKDD)
Recommending routes in the context of bicycling: algorithms, evaluation, and the value of personalization

Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work
Structured sparsity in structured prediction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Approximate computation and implicit regularization for very large-scale data analysis

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Towards a unified architecture for in-RDBMS analytics

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Manifold identification in dual averaging for regularized stochastic online learning

The Journal of Machine Learning Research
Online feature selection for mining big data

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Linguistic structure prediction with the sparseptron

XRDS: Crossroads, The ACM Magazine for Students - Scientific Computing
Feature reduction for efficient object detection via l1-norm latent SVM

IScIDE'12 Proceedings of the third Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering
Constrained stochastic gradient descent for large-scale least squares problem

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Ad click prediction: a view from the trenches

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Nearest-Neighbor Search in the Probability Simplex

Proceedings of the 2013 Conference on the Theory of Information Retrieval
Efficient online learning for multitask feature selection

ACM Transactions on Knowledge Discovery from Data (TKDD)
Sparsity regret bounds for individual sequences in online linear regression

The Journal of Machine Learning Research
Sparse high-dimensional fractional-norm support vector machine via DC programming

Computational Statistics & Data Analysis
Adapting deep RankNet for personalized search

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a general method called truncated gradient to induce sparsity in the weights of online-learning algorithms with convex loss functions. This method has several essential properties: (1) The degree of sparsity is continuous---a parameter controls the rate of sparsification from no sparsification to total sparsification. (2) The approach is theoretically motivated, and an instance of it can be regarded as an online counterpart of the popular L1-regularization method in the batch setting. We prove that small rates of sparsification result in only small additional regret with respect to typical online-learning guarantees. (3) The approach works well empirically. We apply the approach to several data sets and find for data sets with large numbers of features, substantial sparsity is discoverable.