Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks

Authors:
Michael Collins;Amir Globerson;Terry Koo;Xavier Carreras;Peter L. Bartlett
Affiliations:
-;-;-;-;-
Venue:
The Journal of Machine Learning Research
Year:
2008

Citing 27
Cited 28

Elements of information theory

Elements of information theory
A limited memory algorithm for bound constrained optimization

SIAM Journal on Scientific Computing
Support-Vector Networks

Machine Learning
Exponentiated gradient versus gradient descent for linear predictors

Information and Computation
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Relative Loss Bounds for Multidimensional Regression Problems

Machine Learning
Incremental Subgradient Methods for Nondifferentiable Optimization

SIAM Journal on Optimization
On the Dual Formulation of Regularized Linear Systems with Convex Risks

Machine Learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
On the algorithmic implementation of multiclass kernel-based vector machines

The Journal of Machine Learning Research
Support vector machine learning for interdependent and structured output spaces

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Probability and Computing: Randomized Algorithms and Probabilistic Analysis

Probability and Computing: Randomized Algorithms and Probabilistic Analysis
A Fast Dual Algorithm for Kernel Logistic Regression

Machine Learning
Accelerated training of conditional random fields with stochastic gradient methods

ICML '06 Proceedings of the 23rd international conference on Machine learning
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Online large-margin training of dependency parsers

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines

The Journal of Machine Learning Research
Structured Prediction, Dual Extragradient and Bregman Projections

The Journal of Machine Learning Research
Multiplicative Updates for Nonnegative Quadratic Programming

Neural Computation
Exponentiated gradient algorithms for log-linear structured prediction

Proceedings of the 24th international conference on Machine learning
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
A scalable modular convex solver for regularized risk minimization

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression

The Journal of Machine Learning Research
CoNLL-X shared task on multilingual dependency parsing

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Gaps in support vector optimization

COLT'07 Proceedings of the 20th annual conference on Learning theory
Mirror descent and nonlinear projected subgradient methods for convex optimization

Operations Research Letters

A dual coordinate descent method for large-scale linear SVM

Proceedings of the 25th international conference on Machine learning
A sequential dual method for large scale multi-class linear svms

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Max-Margin Weight Learning for Markov Logic Networks

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Online EM for unsupervised models

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Periodic step-size adaptation in second-order gradient descent for single-pass on-line structured learning

Machine Learning
Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Iterative Scaling and Coordinate Descent Methods for Maximum Entropy Models

The Journal of Machine Learning Research
An improved algorithm for the solution of the regularization path of support vector machine

IEEE Transactions on Neural Networks
Efficient third-order dependency parsers

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Fast and Scalable Local Kernel Machines

The Journal of Machine Learning Research
LACBoost and FisherBoost: optimally building cascade classifiers

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Entropy and margin maximization for structured output learning

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Accelerated training of maximum margin Markov models for sequence labeling: a case study of NP chunking

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Exploiting web-derived selectional preference to improve statistical dependency parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Joint training of dependency parsing filters through latent support vector machines

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Large scale real-life action recognition using conditional random fields with stochastic training

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Accelerated training of max-margin Markov networks with kernels

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Maximum margin ranking algorithms for information retrieval

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Joint models for Chinese POS tagging and dependency parsing

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A framework for incorporating general domain knowledge into latent Dirichlet allocation using first-order logic

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Fast Structured Prediction Using Large Margin Sigmoid Belief Networks

International Journal of Computer Vision
Attacking parsing bottlenecks with unlabeled data and relevant factorizations

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Probabilistic Chinese word segmentation with non-local information and stochastic training

Information Processing and Management: an International Journal
Learning Abbreviations from Chinese and English Terms by Modeling Non-Local Information

ACM Transactions on Asian Language Information Processing (TALIP)
Stochastic dual coordinate ascent methods for regularized loss

The Journal of Machine Learning Research
Large-scale linear nonparallel support vector machine solver

Neural Networks
Accelerated training of max-margin Markov networks with kernels

Theoretical Computer Science
Joint Optimization for Chinese POS Tagging and Dependency Parsing

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Log-linear and maximum-margin models are two commonly-used methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, where EG updates are applied to the convex dual of either the log-linear or max-margin objective function; the dual in both the log-linear and max-margin cases corresponds to minimizing a convex function with simplex constraints. We study both batch and online variants of the algorithm, and provide rates of convergence for both cases. In the max-margin case, O(1/ε) EG updates are required to reach a given accuracy ε in the dual; in contrast, for log-linear models only O(log(1/ε)) updates are required. For both the max-margin and log-linear cases, our bounds suggest that the online EG algorithm requires a factor of n less computation to reach a desired accuracy than the batch EG algorithm, where n is the number of training examples. Our experiments confirm that the online algorithms are much faster than the batch algorithms in practice. We describe how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be efficiently applied to problems such as sequence learning or natural language parsing. We perform extensive evaluation of the algorithms, comparing them to L-BFGS and stochastic gradient descent for log-linear models, and to SVM-Struct for max-margin models. The algorithms are applied to a multi-class problem as well as to a more complex large-scale parsing task. In all these settings, the EG algorithms presented here outperform the other methods.