An all-subtrees approach to unsupervised parsing

Authors:
Rens Bod
Affiliations:
University of St Andrews, North Haugh, Scotland, UK
Venue:
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Year:
2006

Citing 20
Cited 26

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Squibs and discussions: the DOP Estimation method is biased and inconsistent

Computational Linguistics
Estimation of probabilistic context-free grammars

Computational Linguistics
An annotation scheme for free word order languages

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Distributional part-of-speech tagging

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
A DOP model for semantic interpretation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
ABL: alignment-based learning

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Computational complexity of probabilistic disambiguation by means of tree-grammars

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
An efficient implementation of a new DOP model

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Building a large-scale annotated Chinese corpus

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A generative constituent-context model for improved grammar induction

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
The unsupervised learning of natural language structure

The unsupervised learning of natural language structure
Unsupervised induction of stochastic context-free grammars using distributional clustering

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Corpus-based induction of syntactic structure: models of dependency and constituency

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
What are the productive units of natural language grammar?: a DOP approach to the automatic identification of constructions

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Unsupervised parsing with U-DOP

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Better k-best parsing

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Natural language grammar induction with a generative constituent-context model

Pattern Recognition

A Graph Based Method for Building Multilingual Weakly Supervised Dependency Parsers

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Unsupervised Learning of Probabilistic Context-Free Grammar using Iterative Biclustering

ICGI '08 Proceedings of the 9th international colloquium on Grammatical Inference: Algorithms and Applications
Limitations of current grammar induction algorithms

ACL '07 Proceedings of the 45th Annual Meeting of the ACL: Student Research Workshop
Reducing Bias Effects in DOP Parameter Estimation

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Darwinised data-oriented parsing: statistical NLP with added sex and death

CACLA '09 Proceedings of the EACL 2009 Workshop on Cognitive Aspects of Computational Language Acquisition
Superior and efficient fully unsupervised pattern-based concept acquisition using an unsupervised parser

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Automatic selection of high quality parses created by a fully unsupervised parser

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Evaluating unsupervised part-of-speech tagging for grammar induction

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Unsupervised induction of labeled parse trees by clustering with syntactic features

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Two approaches for building an unsupervised dependency parser and their other applications

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
A linguistic investigation into unsupervised DOP

CACLA '07 Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition
Semi-supervised learning of dependency parsers using generalized expectation criteria

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Upper bounds for unsupervised parsing with unambiguous non-terminally separated grammars

CLAGI '09 Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference
Improvements in unsupervised co-occurrence based parsing

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Improved fully unsupervised parsing with zoomed learning

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Unsupervised induction of tree substitution grammars for dependency parsing

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Effective constituent projection across languages

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Inducing Tree-Substitution Grammars

The Journal of Machine Learning Research
Formal and empirical grammatical inference

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts of ACL 2011
A new unsupervised approach to word segmentation

Computational Linguistics
Reducing the size of the representation for the uDOP-estimate

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Relaxed cross-lingual projection of constituent syntax

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Unsupervised dependency parsing without gold part-of-speech tags

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Computational models of language acquisition

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Unsupervised part-of-speech disambiguation for high frequency words and its influence on unsupervised parsing

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Empiricist solutions to nativist puzzles by means of unsupervised TSG

Proceedings of the Workshop on Computational Models of Language Acquisition and Loss

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate generalizations of the all-subtrees "DOP" approach to unsupervised parsing. Unsupervised DOP models assign all possible binary trees to a set of sentences and next use (a large random subset of) all subtrees from these binary trees to compute the most probable parse trees. We will test both a relative frequency estimator for unsupervised DOP and a maximum likelihood estimator which is known to be statistically consistent. We report state-of-the-art results on English (WSJ), German (NEGRA) and Chinese (CTB) data. To the best of our knowledge this is the first paper which tests a maximum likelihood estimator for DOP on the Wall Street Journal, leading to the surprising result that an unsupervised parsing model beats a widely used supervised model (a treebank PCFG).