Constrained stochastic gradient descent for large-scale least squares problem

Authors:
Yang Mu;Wei Ding;Tianyi Zhou;Dacheng Tao
Affiliations:
University of Massachusetts Boston, Boston, MA, USA;University of Massachusetts Boston, Boston, MA, USA;University of Technology Sydney, Sydney, Australia;University of Technology Sydney, Sydney, Australia
Venue:
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2013

Citing 12
Cited 0

Acceleration of stochastic approximation by averaging

SIAM Journal on Control and Optimization
Elements of artificial neural networks

Elements of artificial neural networks
Least Squares Support Vector Machine Classifiers

Neural Processing Letters
Introduction to Stochastic Search and Optimization

Introduction to Stochastic Search and Optimization
Solving large scale linear prediction problems using stochastic gradient descent algorithms

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
SVM optimization: inverse dependence on training set size

Proceedings of the 25th international conference on Machine learning
Sparse Online Learning via Truncated Gradient

The Journal of Machine Learning Research
Robust Stochastic Approximation Approach to Stochastic Programming

SIAM Journal on Optimization
Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization

The Journal of Machine Learning Research
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

The Journal of Machine Learning Research
Analysis of stochastic gradient algorithms for linear regression problems

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

The least squares problem is one of the most important regression problems in statistics, machine learning and data mining. In this paper, we present the Constrained Stochastic Gradient Descent (CSGD) algorithm to solve the large-scale least squares problem. CSGD improves the Stochastic Gradient Descent (SGD) by imposing a provable constraint that the linear regression line passes through the mean point of all the data points. It results in the best regret bound $O(\log{T})$, and fastest convergence speed among all first order approaches. Empirical studies justify the effectiveness of CSGD by comparing it with SGD and other state-of-the-art approaches. An example is also given to show how to use CSGD to optimize SGD based least squares problems to achieve a better performance.