Online entropy-based model of lexical category acquisition

Authors:
Grzegorz Chrupala;Afra Alishahi
Affiliations:
Saarland University;Saarland University
Venue:
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Year:
2010

Citing 10
Cited 3

Class-based n-gram models of natural language

Computational Linguistics
COOLCAT: an entropy-based algorithm for categorical clustering

Proceedings of the eleventh international conference on Information and knowledge management
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Entropy-based criterion in categorical clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Combining distributional and morphological information for part of speech induction

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Verbnet: a broad-coverage, comprehensive verb lexicon

Verbnet: a broad-coverage, comprehensive verb lexicon
Inducing syntactic categories by context distribution clustering

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Introduction to Information Retrieval

Introduction to Information Retrieval
An incremental bayesian model for learning syntactic categories

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
GenERRate: generating errors for use in grammatical error detection

EdAppsNLP '09 Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications

Robust induction of parts-of-speech in child-directed language by co-clustering of words and contexts

ROBUS-UNSUP '12 Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP
Hierarchical clustering of word class distributions

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Concurrent acquisition of word meaning and lexical categories

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Children learn a robust representation of lexical categories at a young age. We propose an incremental model of this process which efficiently groups words into lexical categories based on their local context using an information-theoretic criterion. We train our model on a corpus of child-directed speech from CHILDES and show that the model learns a fine-grained set of intuitive word categories. Furthermore, we propose a novel evaluation approach by comparing the efficiency of our induced categories against other category sets (including traditional part of speech tags) in a variety of language tasks. We show the categories induced by our model typically outperform the other category sets.