Unsupervised induction of stochastic context-free grammars using distributional clustering

  • Authors:
  • Alexander Clark

  • Affiliations:
  • University of Sussex, Brighton, United Kingdom

  • Venue:
  • ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

An algorithm is presented for learning a phrase-structure grammar from tagged text. It clusters sequences of tags together based on local distributional information, and selects clusters that satisfy a novel mutual information criterion. This criterion is shown to be related to the entropy of a random variable associated with the tree structures, and it is demonstrated that it selects linguistically plausible constituents. This is incorporated in a Minimum Description Length algorithm. The evaluation of unsupervised models is discussed, and results are presented when the algorithm has been trained on 12 million words of the British National Corpus.