Piecewise training for structured prediction

Authors:
Charles Sutton;Andrew Mccallum
Affiliations:
Department of Computer Science, University of Massachusetts, Amherst, USA 01003;Department of Computer Science, University of Massachusetts, Amherst, USA 01003
Venue:
Machine Learning
Year:
2009

Citing 25
Cited 7

Learning Low-Level Vision

International Journal of Computer Vision - Special issue on statistical and computational theories of vision: modeling, learning, sampling and computing, Part I
Markov random field modeling in image analysis

Markov random field modeling in image analysis
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
An Alternate Objective Function for Markovian Fields

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Expectation Propagation for approximate Bayesian inference

UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Machine learning for information extraction in informal domains

Machine learning for information extraction in informal domains
Ultraconservative online algorithms for multiclass problems

The Journal of Machine Learning Research
Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data

ICML '04 Proceedings of the twenty-first international conference on Machine learning
An integrated, conditional model of information extraction and coreference with application to citation matching

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Learning as search optimization: approximate large margin methods for structured prediction

ICML '05 Proceedings of the 22nd international conference on Machine learning
Accelerated training of conditional random fields with stochastic gradient methods

ICML '06 Proceedings of the 23rd international conference on Machine learning
Consistency of Pseudolikelihood Estimation of Fully Visible Boltzmann Machines

Neural Computation
Alignment by agreement

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data

The Journal of Machine Learning Research
Piecewise pseudolikelihood for efficient training of conditional random fields

Proceedings of the 24th international conference on Machine learning
Efficient training methods for conditional random fields

Efficient training methods for conditional random fields
Solving the problem of cascading errors: approximate Bayesian inference for linguistic annotation pipelines

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Learning coordination classifiers

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Learning and inference over constrained output

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
A new class of upper bounds on the log partition function

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
Tree-based reparameterization framework for analysis of sum-product and related algorithms

IEEE Transactions on Information Theory
Constructing free-energy approximations and generalized belief propagation algorithms

IEEE Transactions on Information Theory

Guest editorial: special issue on structured prediction

Machine Learning
Graphical models over multiple strings

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Labelwise margin maximization for sequence labeling

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Structured Learning and Prediction in Computer Vision

Foundations and Trends® in Computer Graphics and Vision
Lifted online training of relational models with stochastic gradient methods

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Applying piecewise approximation in perceptron training of conditional random fields

IDA'12 Proceedings of the 11th international conference on Advances in Intelligent Data Analysis
Joint inference of entities, relations, and coreference

Proceedings of the 2013 workshop on Automated knowledge base construction

Quantified Score

Hi-index	0.00

Visualization

Abstract

A drawback of structured prediction methods is that parameter estimation requires repeated inference, which is intractable for general structures. In this paper, we present an approximate training algorithm called piecewise training (PW) that divides the factors into tractable subgraphs, which we call pieces, that are trained independently. Piecewise training can be interpreted as approximating the exact likelihood using belief propagation, and different ways of making this interpretation yield different insights into the method. We also present an extension to piecewise training, called piecewise pseudolikelihood (PWPL), designed for when variables have large cardinality. On several real-world natural language processing tasks, piecewise training performs superior to Besag's pseudolikelihood and sometimes comparably to exact maximum likelihood. In addition, PWPL performs similarly to PW and superior to standard pseudolikelihood, but is five to ten times more computationally efficient than batch maximum likelihood training.