Precise n-gram probabilities from stochastic context-free grammars

  • Authors:
  • Andreas Stolcke;Jonathan Segal

  • Affiliations:
  • University of California, Berkeley, Berkeley, CA;University of California, Berkeley, Berkeley, CA

  • Venue:
  • ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present an algorithm for computing n-gram probabilities from stochastic context-free grammars, a procedure that can alleviate some of the standard problems associated with n-grams (estimation from sparse data, lack of linguistic structure, among others). The method operates via the computation of substring expectations, which in turn is accomplished by solving systems of linear equations derived from the grammar. The procedure is fully implemented and has proved viable and useful in practice.