Prefix probabilities from stochastic Tree Adjoining Grammars

  • Authors:
  • Mark-Jan Nederhof;Anoop Sarkar;Giorgio Satta

  • Affiliations:
  • DFKI, Saarbrücken, Germany;Univ of Pennsylvania, Philadelphia, PA;Univ. di Padova, Padova, Italy

  • Venue:
  • COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

Language models for speech recognition typically use a probability model of the form Pr(an/a1, a2, .... an-1 Stochastic grammars, on the other hand, are typically used to assign structure to utterances. A language model of the above form is constructed from such grammars by computing the prefix probability ∑wεσ* Pr(a1 ...anw), where w represents all possible terminations of the prefix a1 ... an. The main result in this paper is an algorithm to compute such prefix probabilities given a stochastic Tree Adjoining Grammar (TAG). The algorithm achieves the required computation in O(n 6) time. The probability of sub-derivations that do not derive any words in the prefix, but contribute structurally to its derivation, are precomputed to achieve termination. This algorithm enables existing corpus-based estimation techniques for stochastic TAGs to be used for language modelling.