Exponentiated gradient algorithms for log-linear structured prediction

Authors:
Amir Globerson;Terry Y. Koo;Xavier Carreras;Michael Collins
Affiliations:
Computer Science and Artificial Intelligence Laboratory, Cambridge, MA;Computer Science and Artificial Intelligence Laboratory, Cambridge, MA;Computer Science and Artificial Intelligence Laboratory, Cambridge, MA;Computer Science and Artificial Intelligence Laboratory, Cambridge, MA
Venue:
Proceedings of the 24th international conference on Machine learning
Year:
2007

Citing 11
Cited 13

Elements of information theory

Elements of information theory
Exponentiated gradient versus gradient descent for linear predictors

Information and Computation
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Logistic Regression, AdaBoost and Bregman Distances

Machine Learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A Fast Dual Algorithm for Kernel Logistic Regression

Machine Learning
Accelerated training of conditional random fields with stochastic gradient methods

ICML '06 Proceedings of the 23rd international conference on Machine learning
Online large-margin training of dependency parsers

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
CoNLL-X shared task on multilingual dependency parsing

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Mirror descent and nonlinear projected subgradient methods for convex optimization

Operations Research Letters

Laplace maximum margin Markov networks

Proceedings of the 25th international conference on Machine learning
Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks

The Journal of Machine Learning Research
Primal sparse Max-margin Markov networks

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Training Global Linear Models for Chinese Word Segmentation

Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
Accelerating the annotation of sparse named entities by dynamic sentence selection

BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
TAG, dynamic programming, and the perceptron for efficient, feature-rich parsing

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Using LDA to detect semantically incoherent documents

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Cutting-plane training of structural SVMs

Machine Learning
Structured prediction with reinforcement learning

Machine Learning
Training parsers by inverse reinforcement learning

Machine Learning
Maximum Entropy Discrimination Markov Networks

The Journal of Machine Learning Research
Grafting-light: fast, incremental feature selection and structure learning of Markov random fields

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Confidence-weighted linear classification for text categorization

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Conditional log-linear models are a commonly used method for structured prediction. Efficient learning of parameters in these models is therefore an important problem. This paper describes an exponentiated gradient (EG) algorithm for training such models. EG is applied to the convex dual of the maximum likelihood objective; this results in both sequential and parallel update algorithms, where in the sequential algorithm parameters are updated in an online fashion. We provide a convergence proof for both algorithms. Our analysis also simplifies previous results on EG for max-margin models, and leads to a tighter bound on convergence rates. Experiments on a large-scale parsing task show that the proposed algorithm converges much faster than conjugate-gradient and L-BFGS approaches both in terms of optimization objective and test error.