Squibs and discussions: the DOP Estimation method is biased and inconsistent

Authors:
Mark Johnson
Affiliations:
Brown University, Providence, RI
Venue:
Computational Linguistics
Year:
2002

Citing 1
Cited 18

Neural networks and the bias/variance dilemma

Neural Computation

An efficient implementation of a new DOP model

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
An all-subtrees approach to unsupervised parsing

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Reducing Bias Effects in DOP Parameter Estimation

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Darwinised data-oriented parsing: statistical NLP with added sex and death

CACLA '09 Proceedings of the EACL 2009 Workshop on Cognitive Aspects of Computational Language Acquisition
Unsupervised parsing with U-DOP

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Theoretical evaluation of estimation methods for data-oriented parsing

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Inducing compact but accurate tree-substitution grammars

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A unified model of structural organization in language and music

Journal of Artificial Intelligence Research
A Bayesian model of syntax-directed tree to string grammar induction

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Accuracy-based scoring for DOT: towards direct error minimization for data-oriented translation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Simple, accurate parsing with an all-fragments grammar

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Head-modifier relation based non-lexical reordering model for phrase-based translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Re-structuring, re-labeling, and re-aligning for syntax-based machine translation

Computational Linguistics
Inducing Tree-Substitution Grammars

The Journal of Machine Learning Research
Panning for EBMT gold, or "Remembering not to forget"

Machine Translation
Accurate parsing with compact tree-substitution grammars: Double-DOP

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A new general grammar formalism for parsing

MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I

Quantified Score

Hi-index	0.01

Visualization

Abstract

A data-oriented parsing or DOP model for statistical parsing associates fragments of linguistic representations with numerical weights, where these weights are estimated by normalizing the empirical frequency of each fragment in a training corpus (see Bod [1998] and references cited therein). This note observes that this estimation method is biased and inconsistent; that is, the estimated distribution does not in general converge on the true distribution as the size of the training corpus increases.