Parameter estimation for statistical parsing models: theory and practice of distribution-free methods

Authors:
Michael Collins
Affiliations:
MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA
Venue:
New developments in parsing technology
Year:
2004

Citing 27
Cited 20

A theory of the learnable

Communications of the ACM
The nature of statistical learning theory

The nature of statistical learning theory
Support-Vector Networks

Machine Learning
Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Advances in kernel methods: support vector learning

Advances in kernel methods: support vector learning
Making large-scale support vector machine learning practical

Advances in kernel methods
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Additive models, boosting, and inference for generalized divergences

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Large Margin Classification Using the Perceptron Algorithm

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Learning in Neural Networks: Theoretical Foundations

Learning in Neural Networks: Theoretical Foundations
Linear Programming Boosting via Column Generation

Machine Learning
Logistic Regression, AdaBoost and Bregman Distances

Machine Learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Discriminative Reranking for Natural Language Parsing

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
An Efficient Boosting Algorithm for Combining Preferences

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
Covering number bounds of certain regularized linear function classes

The Journal of Machine Learning Research
On the algorithmic implementation of multiclass kernel-based vector machines

The Journal of Machine Learning Research
Stochastic attribute-value grammars

Computational Linguistics
Estimators for stochastic "Unification-Based" grammars

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Ranking algorithms for named-entity extraction: boosting and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
SPoT: a trainable sentence planner

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network

IEEE Transactions on Information Theory

Introduction to the special issue on statistical language modeling

ACM Transactions on Asian Language Information Processing (TALIP)
Case-factor diagrams for structured probabilistic modeling

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Filtering-Ranking Perceptron Learning for Partial Parsing

Machine Learning
Ranking and Reranking with Perceptron

Machine Learning
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach

Computational Linguistics
Discriminative Reranking for Natural Language Parsing

Computational Linguistics
Discriminative language modeling with conditional random fields and the perceptron algorithm

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Incremental parsing with the perceptron algorithm

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Discriminative syntactic language modeling for speech recognition

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Discriminative n-gram language modeling

Computer Speech and Language
Case-factor diagrams for structured probabilistic modeling

Journal of Computer and System Sciences
Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
A semiparametric generative model for efficient structured-output supervised learning

Annals of Mathematics and Artificial Intelligence
Another look at indirect negative evidence

CACLA '09 Proceedings of the EACL 2009 Workshop on Cognitive Aspects of Computational Language Acquisition
Cutting-plane training of structural SVMs

Machine Learning
Max-Margin Weight Learning for Markov Logic Networks

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Nbest dependency parsing with linguistically rich models

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Combination strategies for semantic role labeling

Journal of Artificial Intelligence Research
An integrated approach to robust processing of situated spoken dialogue

SRSL '09 Proceedings of the 2nd Workshop on Semantic Representation of Spoken Language
Bootstrapping semantic parsers from conversations

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A fundamental problem in statistical parsing is the choice of criteria and algo-algorithms used to estimate the parameters in a model. The predominant approach in computational linguistics has been to use a parametric model with some variant of maximum-likelihood estimation. The assumptions under which maximum-likelihood estimation is justified are arguably quite strong. This chapter discusses the statistical theory underlying various parameter-estimation methods, and gives algorithms which depend on alternatives to (smoothed) maximum-likelihood estimation. We first give an overview of results from statistical learning theory. We then show how important concepts from the classification literature - specifically, generalization results based on margins on training data - can be derived for parsing models. Finally, we describe parameter estimation algorithms which are motivated by these generalization bounds.