The problem of computing the most probable tree in data-oriented parsing and stochastic tree grammars

Authors:
Rens Bod
Affiliations:
University of Amsterdam, Amsterdam, The Netherlands
Venue:
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Year:
1995

Citing 9
Cited 3

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations
The ATIS spoken language systems pilot corpus

HLT '90 Proceedings of the workshop on Speech and Natural Language
Data-Oriented Parsing

Data-Oriented Parsing
Generalized probabilistic LR parsing of natural language (Corpora) with unification-based grammars

Computational Linguistics - Special issue on using large corpora: I
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Using an annotated corpus as a stochastic grammar

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Efficiency, robustness and accuracy in Picky chart parsing

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Stochastic lexicalized tree-adjoining grammars

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
A computational model of language performance: Data Oriented Parsing

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3

Computational complexity of probabilistic disambiguation by means of tree-grammars

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Solving the problem of cascading errors: approximate Bayesian inference for linguistic annotation pipelines

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Inducing Tree-Substitution Grammars

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We deal with the question as to whether there exists a polynomial time algorithm for computing the most probable parse tree of a sentence generated by a data-oriented parsing (DOP) model. (Scha, 1990; Bod, 1992, 1993a). Therefore we describe DOP as a stochastic tree-substitution grammar (STSG). In STSG, a tree can be generated by exponentially many derivations involving different elementary trees. The probability of a tree is equal to the sum of the probabilities of all its derivations.We show that in STSG, in contrast with stochastic context-free grammar, the Viterbi algorithm cannot be used for computing a most probable tree of a string. We propose a simple modification of Viterbi which allows by means of a "select-random" search to estimate the most probable tree of a string in polynomial time.Experiments with DOP on ATIS show that only in 68% of the cases, the most probable derivation of a string generates the most probable tree of that string. Therefore, the parse accuracy obtained by the most probable trees (96%) is dramatically higher than the parse accuracy obtained by the most probable derivations (65%).It is still an open question whether the most probable tree of a string can be deterministically computed in polynomial time.