Simple unsupervised grammar induction from raw text with cascaded finite state models

  • Authors:
  • Elias Ponvert;Jason Baldridge;Katrin Erk

  • Affiliations:
  • The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX

  • Venue:
  • HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider a new subproblem of unsupervised parsing from raw text, unsupervised partial parsing---the unsupervised version of text chunking. We show that addressing this task directly, using probabilistic finite-state methods, produces better results than relying on the local predictions of a current best unsu-pervised parser, Seginer's (2007) CCL. These finite-state models are combined in a cascade to produce more general (full-sentence) constituent structures; doing so outperforms CCL by a wide margin in unlabeled PARSEVAL scores for English, German and Chinese. Finally, we address the use of phrasal punctuation as a heuristic indicator of phrasal boundaries, both in our system and in CCL.