Simple Unsupervised Identification of Low-Level Constituents

  • Authors:
  • Elias Ponvert;Jason Baldridge;Katrin Erk

  • Affiliations:
  • -;-;-

  • Venue:
  • ICSC '10 Proceedings of the 2010 IEEE Fourth International Conference on Semantic Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present an approach to unsupervised partial parsing: the identification of low-level constituents (which we dub clumps) in unannotated text. We begin by showing that CCLParser (Seginer 2007), an unsupervised parsing model, is particularly adept at identifying clumps, and that, surprisingly, building a simple right-branching structure above its clumps actually outperforms the full parser itself, indicating that much of the CCLParser's performance comes from good local predictions. Based on this observation, we define a simple bigram model that is competitive with CCLParser for clumping which further illustrates how important this level of representation is for unsupervised parsing.