Frequency-aware truncated methods for sparse online learning

Authors:
Hidekazu Oiwa;Shin Matsushima;Hiroshi Nakagawa
Affiliations:
University of Tokyo, Tokyo, Japan;University of Tokyo, Tokyo, Japan;University of Tokyo, Tokyo, Japan
Venue:
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Year:
2011

Citing 7
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
Confidence-weighted linear classification

Proceedings of the 25th international conference on Machine learning
Primal-dual subgradient methods for convex problems

Mathematical Programming: Series A and B - Series B - Special Issue: Nonsmooth Optimization and Applications
Sparse Online Learning via Truncated Gradient

The Journal of Machine Learning Research
Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Efficient Online and Batch Learning Using Forward Backward Splitting

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Online supervised learning with L1-regularization has gained attention recently because it generally requires less computational time and a smaller space of complexity than batch-type learning methods. However, a simple L1-regularization method used in an online setting has the side effect that rare features tend to be truncated more than necessary. In fact, feature frequency is highly skewed in many applications. We developed a new family of L1-regularization methods based on the previous updates for loss minimization in linear online learning settings. Our methods can identify and retain low-frequency occurrence but informative features at the same computational cost and convergence rate as previous works. Moreover, we combined our methods with a cumulative penalty model to derive more robust models over noisy data. We applied our methods to several datasets and empirically evaluated the performance of our algorithms. Experimental results showed that our frequency-aware truncated models improved the prediction accuracy.