Simple Unsupervised Identification of Low-Level Constituents

Authors:
Elias Ponvert;Jason Baldridge;Katrin Erk
Affiliations:
-;-;-
Venue:
ICSC '10 Proceedings of the 2010 IEEE Fourth International Conference on Semantic Computing
Year:
2010

Citing 0
Cited 4

Simple unsupervised grammar induction from raw text with cascaded finite state models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Unsupervised syntactic chunking with acoustic cues: computational models for prosodic bootstrapping

CMCL '11 Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics
Capitalization cues improve dependency grammar induction

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Three dependency-and-boundary models for grammar induction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an approach to unsupervised partial parsing: the identification of low-level constituents (which we dub clumps) in unannotated text. We begin by showing that CCLParser (Seginer 2007), an unsupervised parsing model, is particularly adept at identifying clumps, and that, surprisingly, building a simple right-branching structure above its clumps actually outperforms the full parser itself, indicating that much of the CCLParser's performance comes from good local predictions. Based on this observation, we define a simple bigram model that is competitive with CCLParser for clumping which further illustrates how important this level of representation is for unsupervised parsing.