Unsupervised grammar induction by distribution and attachment

Authors:
David J. Brooks
Affiliations:
University of Birmingham, Birmingham, UK
Venue:
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Year:
2006

Citing 3
Cited 2

Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
The unsupervised learning of natural language structure

The unsupervised learning of natural language structure
Unsupervised induction of stochastic context-free grammars using distributional clustering

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7

Identifying patterns for unsupervised grammar induction

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Dependency syntax analysis using grammar induction and a lexical categories precedence system

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributional approaches to grammar induction are typically inefficient, enumerating large numbers of candidate constituents. In this paper, we describe a simplified model of distributional analysis which uses heuristics to reduce the number of candidate constituents under consideration. We apply this model to a large corpus of over 400000 words of written English, and evaluate the results using EVALB. We show that the performance of this approach is limited, providing a detailed analysis of learned structure and a comparison with actual constituent-context distributions. This motivates a more structured approach, using a process of attachment to form constituents from their distributional components. Our findings suggest that distributional methods do not generalize enough to learn syntax effectively from raw text, but that attachment methods are more successful.