Unsupervised parsing with U-DOP

Authors:
Rens Bod
Affiliations:
University of St Andrews, St Andrews, Scotland, UK
Venue:
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Year:
2006

Citing 20
Cited 13

Squibs and discussions: the DOP Estimation method is biased and inconsistent

Computational Linguistics
Data-Oriented Parsing

Data-Oriented Parsing
An annotation scheme for free word order languages

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Distributional part-of-speech tagging

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
ABL: alignment-based learning

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Computational complexity of probabilistic disambiguation by means of tree-grammars

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
An efficient implementation of a new DOP model

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Building a large-scale annotated Chinese corpus

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A generative constituent-context model for improved grammar induction

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
An improved parser for data-oriented lexical-functional analysis

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
The unsupervised learning of natural language structure

The unsupervised learning of natural language structure
Inducing syntactic categories by context distribution clustering

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Unsupervised induction of stochastic context-free grammars using distributional clustering

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Corpus-based induction of syntactic structure: models of dependency and constituency

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Theoretical evaluation of estimation methods for data-oriented parsing

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
A unified model of structural organization in language and music

Journal of Artificial Intelligence Research
Better k-best parsing

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Natural language grammar induction with a generative constituent-context model

Pattern Recognition

An all-subtrees approach to unsupervised parsing

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Superior and efficient fully unsupervised pattern-based concept acquisition using an unsupervised parser

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Automatic selection of high quality parses created by a fully unsupervised parser

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Unsupervised induction of labeled parse trees by clustering with syntactic features

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Upper bounds for unsupervised parsing with unambiguous non-terminally separated grammars

CLAGI '09 Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference
Improved fully unsupervised parsing with zoomed learning

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Bounding the maximal parsing performance of non-terminally separated grammars

ICGI'10 Proceedings of the 10th international colloquium conference on Grammatical inference: theoretical results and applications
Formal and empirical grammatical inference

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts of ACL 2011
Simple unsupervised grammar induction from raw text with cascaded finite state models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Reducing the size of the representation for the uDOP-estimate

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Computational models of language acquisition

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
A feature-rich constituent context model for grammar induction

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Semi-supervised constituent grammar induction based on text chunking information

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	0.02

Visualization

Abstract

We propose a generalization of the supervised DOP model to unsupervised learning. This new model, which we call U-DOP, initially assigns all possible unlabeled binary trees to a set of sentences and next uses all subtrees from (a large subset of) these binary trees to compute the most probable parse trees. We show how U-DOP can be implemented by a PCFG-reduction technique and report competitive results on English (WSJ), German (NEGRA) and Chinese (CTB) data. To the best of our knowledge, this is the first paper which accurately bootstraps structure for Wall Street Journal sentences up to 40 words obtaining roughly the same accuracy as a binarized supervised PCFG. We show that previous approaches to unsupervised parsing have shortcomings in that they either constrain the lexical or the structural context, or both.