Solving large scale linear prediction problems using stochastic gradient descent algorithms

Authors:
Tong Zhang
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY
Venue:
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Year:
2004

Citing 7
Cited 44

Acceleration of stochastic approximation by averaging

SIAM Journal on Control and Optimization
Large Margin Classification Using the Perceptron Algorithm

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Analysis of two gradient-based algorithms for on-line regression

Journal of Computer and System Sciences
Relative Loss Bounds for Multidimensional Regression Problems

Machine Learning
Text Categorization Based on Regularized Linear Classification Methods

Information Retrieval
Large Margin Classification for Moving Targets

ALT '02 Proceedings of the 13th International Conference on Algorithmic Learning Theory
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10

Recursive Aggregation of Estimators by the Mirror Descent Algorithm with Averaging

Problems of Information Transmission
Maximum margin planning

ICML '06 Proceedings of the 23rd international conference on Machine learning
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

The Journal of Machine Learning Research
A high-performance semi-supervised learning method for text chunking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A localized prediction model for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A block bigram prediction model for statistical machine translation

ACM Transactions on Speech and Language Processing (TSLP)
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
A dual coordinate descent method for large-scale linear SVM

Proceedings of the 25th international conference on Machine learning
A Fast Method for Training Linear SVM in the Primal

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines

The Journal of Machine Learning Research
Word sense disambiguation across two domains: Biomedical literature and clinical notes

Journal of Biomedical Informatics
Applying alternating structure optimization to word sense disambiguation

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
The value of parsing as feature generation for gene mention recognition

Journal of Biomedical Informatics
Streamed learning: one-pass SVMs

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Distributed training strategies for the structured perceptron

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Cross-language text classification using structural correspondence learning

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Finding related sentence pairs in MEDLINE

Information Retrieval
Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization

The Journal of Machine Learning Research
Text mining techniques for leveraging positively labeled data

BioNLP '11 Proceedings of BioNLP 2011 Workshop
Ranking related news predictions

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Trading representability for scalability: adaptive multi-hyperplane machine for nonlinear classification

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting adversarial advertisements in the wild

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Cross-Lingual Adaptation Using Structural Correspondence Learning

ACM Transactions on Intelligent Systems and Technology (TIST)
Efficient subsampling for training complex language models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A review of optimization methodologies in support vector machines

Neurocomputing
Manifold identification in dual averaging for regularized stochastic online learning

The Journal of Machine Learning Research
Confidence-weighted linear classification for text categorization

The Journal of Machine Learning Research
Inhibition in multiclass classification

Neural Computation
Learning from evolving data streams: online triage of bug reports

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Learning to rank search results for time-sensitive queries

Proceedings of the 21st ACM international conference on Information and knowledge management
Identifying well-formed biomedical phrases in MEDLINE® text

Journal of Biomedical Informatics
Sublinear algorithms for penalized logistic regression in massive datasets

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Online learning with multiple kernels: A review

Neural Computation
Transfer joint embedding for cross-domain named entity recognition

ACM Transactions on Information Systems (TOIS)
Constrained stochastic gradient descent for large-scale least squares problem

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Sensing the pulse of urban refueling behavior

Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing
Stochastic dual coordinate ascent methods for regularized loss

The Journal of Machine Learning Research
Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training

The Journal of Machine Learning Research
On sparsity and drift for effective real-time filtering in microblogs

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Lerot: an online learning to rank framework

Proceedings of the 2013 workshop on Living labs for information retrieval evaluation
Online variational learning of generalized Dirichlet mixture models with feature selection

Neurocomputing
Large-scale linear nonparallel support vector machine solver

Neural Networks
RankCNN: When learning to rank encounters the pseudo preference feedback

Computer Standards & Interfaces
Recent and robust query auto-completion

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Linear prediction methods, such as least squares for regression, logistic regression and support vector machines for classification, have been extensively used in statistics and machine learning. In this paper, we study stochastic gradient descent (SGD) algorithms on regularized forms of linear prediction methods. This class of methods, related to online algorithms such as perceptron, are both efficient and very simple to implement. We obtain numerical rate of convergence for such algorithms, and discuss its implications. Experiments on text data will be provided to demonstrate numerical and statistical consequences of our theoretical findings.