Using an annotated corpus as a stochastic grammar

Authors:
Rens Bod
Affiliations:
University of Amsterdam, Amsterdam
Venue:
EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Year:
1993

Citing 5
Cited 19

The ATIS spoken language systems pilot corpus

HLT '90 Proceedings of the workshop on Speech and Natural Language
Inside-outside reestimation from partially bracketed corpora

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Probabilistic tree-adjoining grammar as a framework for statistical natural language processing

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Stochastic lexicalized tree-adjoining grammars

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
A computational model of language performance: Data Oriented Parsing

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3

Do all fragments count?

Natural Language Engineering
Context-sensitive spoken dialogue processing with the DOP model

Natural Language Engineering
Evaluating two methods for Treebank grammar compaction

Natural Language Engineering
Experiments with corpus-based LFG specialization

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Review of "Statistical language learning" by Eugene Charniak. The MIT Press 1993.

Computational Linguistics
The problem of computing the most probable tree in data-oriented parsing and stochastic tree grammars

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
A DOP model for semantic interpretation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A probabilistic corpus-driven model for lexical-functional analysis

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Parsing algorithms and metrics

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Parsing with the shortest derivation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Towards a more careful evaluation of broad coverage parsing systems

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
What are the productive units of natural language grammar?: a DOP approach to the automatic identification of constructions

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Theoretical evaluation of estimation methods for data-oriented parsing

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Bayesian learning of a tree substitution grammar

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Simple, accurate parsing with an all-fragments grammar

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Judging grammaticality with tree substitution grammar derivations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
The surprising variance in shortest-derivation parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Judging grammaticality with count-induced tree substitution grammars

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Toward Tree Substitution Grammars with latent annotations

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure

Quantified Score

Hi-index	0.00

Visualization

Abstract

In Data Oriented Parsing (DOP), an annotated corpus is used as a stochastic grammar. An input string is parsed by combining subtrees from the corpus. As a consequence, one parse tree can usually be generated by several derivations that involve different subtrees. This leads to a statistics where the probability of a parse is equal to the sum of the probabilities of all its derivations. In (Scha, 1990) an informal introduction to DOP is given, while (Bod, 1992a) provides a formalization of the theory. In this paper we compare DOP with other stochastic grammars in the context of Formal Language Theory. It it proved that it is not possible to create for every DOP-model a strongly equivalent stochastic CFG which also assigns the same probabilities to the parses. We show that the maximum probability parse can be estimated in polynomial time by applying Monte Carlo techniques. The model was tested on a set of hand-parsed strings from the Air Travel Information System (ATIS) spoken language corpus. Preliminary experiments yield 96% test set parsing accuracy.