An incremental bayesian model for learning syntactic categories

Authors:
Christopher Parisien;Afsaneh Fazly;Suzanne Stevenson
Affiliations:
University of Toronto, Toronto, ON, Canada;University of Toronto, Toronto, ON, Canada;University of Toronto, Toronto, ON, Canada
Venue:
CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Year:
2008

Citing 2
Cited 5

Part-of-speech induction from scratch

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Inducing syntactic categories by context distribution clustering

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7

Categorizing local contexts as a step in grammatical category induction

CACLA '09 Proceedings of the EACL 2009 Workshop on Cognitive Aspects of Computational Language Acquisition
Online entropy-based model of lexical category acquisition

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Robust induction of parts-of-speech in child-directed language by co-clustering of words and contexts

ROBUS-UNSUP '12 Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP
Hierarchical clustering of word class distributions

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Concurrent acquisition of word meaning and lexical categories

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an incremental Bayesian model for the unsupervised learning of syntactic categories from raw text. The model draws information from the distributional cues of words within an utterance, while explicitly bootstrapping its development on its own partially-learned knowledge of syntactic categories. Testing our model on actual child-directed data, we demonstrate that it is robust to noise, learns reasonable categories, manages lexical ambiguity, and in general shows learning behaviours similar to those observed in children.