Prefix probabilities from stochastic Tree Adjoining Grammars

Authors:
Mark-Jan Nederhof;Anoop Sarkar;Giorgio Satta
Affiliations:
DFKI, Saarbrücken, Germany;Univ of Pennsylvania, Philadelphia, PA;Univ. di Padova, Padova, Italy
Venue:
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Year:
1998

Citing 4
Cited 3

An efficient probabilistic context-free parsing algorithm that computes prefix probabilities

Computational Linguistics
Computation of the probability of initial substring generation by stochastic context-free grammars

Computational Linguistics
Parsing incomplete sentences

COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Stochastic lexicalized tree-adjoining grammars

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2

A probabilistic earley parser as a psycholinguistic model

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Modeling the noun phrase versus sentence coordination ambiguity in Dutch: evidence from surprisal theory

CMCL '10 Proceedings of the 2010 Workshop on Cognitive Modeling and Computational Linguistics
Prefix probabilities for linear context-free rewriting systems

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Language models for speech recognition typically use a probability model of the form Pr(an/a1, a2, .... an-1 Stochastic grammars, on the other hand, are typically used to assign structure to utterances. A language model of the above form is constructed from such grammars by computing the prefix probability ∑wεσ* Pr(a1 ...anw), where w represents all possible terminations of the prefix a1 ... an. The main result in this paper is an algorithm to compute such prefix probabilities given a stochastic Tree Adjoining Grammar (TAG). The algorithm achieves the required computation in O(n 6) time. The probability of sub-derivations that do not derive any words in the prefix, but contribute structurally to its derivation, are precomputed to achieve termination. This algorithm enables existing corpus-based estimation techniques for stochastic TAGs to be used for language modelling.