Reducing the size of the representation for the uDOP-estimate

Authors:
Christoph Teichmann
Affiliations:
University of Leipzig, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig
Venue:
EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Year:
2011

Citing 18
Cited 1

Effective construction of the synthetic algebra of a recognizable series on trees

Acta Informatica
Parsing inside-out

Parsing inside-out
An annotation scheme for free word order languages

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Inside-outside reestimation from partially bracketed corpora

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
A generative constituent-context model for improved grammar induction

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
An all-subtrees approach to unsupervised parsing

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Estimation of consistent probabilistic context-free grammars

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
The ruby programming language

The ruby programming language
Unsupervised parsing with U-DOP

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Minimizing deterministic weighted tree automata

Information and Computation
Parsing algorithms based on tree automata

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Simple, accurate parsing with an all-fragments grammar

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Profiting from mark-up: hyper-text annotations for guided parsing

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Improvements in unsupervised co-occurrence based parsing

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Identifying patterns for unsupervised grammar induction

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Simple unsupervised grammar induction from raw text with cascaded finite state models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Punctuation: making a point in unsupervised dependency parsing

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
An overview of probabilistic tree transducers for natural language processing

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Empiricist solutions to nativist puzzles by means of unsupervised TSG

Proceedings of the Workshop on Computational Models of Language Acquisition and Loss

Quantified Score

Hi-index	0.00

Visualization

Abstract

The unsupervised Data Oriented Parsing (uDOP) approach has been repeatedly reported to achieve state of the art performance in experiments on parsing of different corpora. At the same time the approach is demanding both in computation time and memory. This paper describes an approach which decreases these demands. First the problem is translated into the generation of probabilistic bottom up tree automata (pBTA). Then it is explained how solving two standard problems for these automata results in a reduction in the size of the grammar. The reduction of the grammar size by using efficient algorithms for pBTAs is the main contribution of this paper. Experiments suggest that this leads to a reduction in grammar size by a factor of 2. This paper also suggests some extensions of the original uDOP algorithm that are made possible or aided by the use of tree automata.