Distributed training strategies for the structured perceptron

Authors:
Ryan McDonald;Keith Hall;Gideon Mann
Affiliations:
Google, Inc., New York / Zurich;Google, Inc., New York / Zurich;Google, Inc., New York / Zurich
Venue:
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2010

Citing 17
Cited 24

Bagging predictors

Machine Learning
Large Margin Classification Using the Perceptron Algorithm

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Solving large scale linear prediction problems using stochastic gradient descent algorithms

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Support vector machine learning for interdependent and structured output spaces

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Incremental parsing with the perceptron algorithm

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Online large-margin training of dependency parsers

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
An end-to-end discriminative approach to machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Confidence-weighted linear classification

Proceedings of the 25th international conference on Machine learning
Web-scale named entity recognition

Proceedings of the 17th ACM conference on Information and knowledge management
CoNLL-X shared task on multilingual dependency parsing

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
A tale of two parsers: investigating and combining graph-based and transition-based dependency parsing using beam-search

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
On-line learning with delayed label feedback

ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory

Optimal and syntactically-informed decoding for monolingual phrase-based alignment

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Improved video categorization from text metadata and user comments

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Large-scale matrix factorization with distributed stochastic gradient descent

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Scaling the mobile millennium system in the cloud

Proceedings of the 2nd ACM Symposium on Cloud Computing
SampleRank training for phrase-based machine translation

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Feature-rich language-independent syntax-based alignment for statistical machine translation

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Convergence of Distributed Asynchronous Learning Vector Quantization Algorithms

The Journal of Machine Learning Research
Large-scale machine learning at twitter

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Hope and fear for discriminative training of statistical translation models

The Journal of Machine Learning Research
Confidence-weighted linear classification for text categorization

The Journal of Machine Learning Research
Perceptron models for online structured prediction

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
HadoopPerceptron: a toolkit for distributed perceptron training and prediction with MapReduce

EACL '12 Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Optimized online rank learning for machine translation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Batch tuning strategies for statistical machine translation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Joint feature selection in distributed stochastic learning for large-scale discriminative training in SMT

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Concept-to-text generation via discriminative reranking

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Large-scale discriminative language model reranking for voice-search

WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
User demographics and language in an implicit social network

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Optimization strategies for online large-margin learning in machine translation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Efficient protocols for distributed classification and optimization

ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Scaling big data mining infrastructure: the twitter experience

ACM SIGKDD Explorations Newsletter
A Named Entity Recognition Method Based on Decomposition and Concatenation of Word Chunks

ACM Transactions on Asian Language Information Processing (TALIP)
A fast parallel SGD for matrix factorization in shared memory systems

Proceedings of the 7th ACM conference on Recommender systems
Communication-efficient algorithms for statistical optimization

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Perceptron training is widely applied in the natural language processing community for learning complex structured models. Like all structured prediction learning frameworks, the structured perceptron can be costly to train as training complexity is proportional to inference, which is frequently non-linear in example sequence length. In this paper we investigate distributed training strategies for the structured perceptron as a means to reduce training times when computing clusters are available. We look at two strategies and provide convergence bounds for a particular mode of distributed structured perceptron training based on iterative parameter mixing (or averaging). We present experiments on two structured prediction problems -- named-entity recognition and dependency parsing -- to highlight the efficiency of this method.