Class-based n-gram models of natural language
Computational Linguistics
COOLCAT: an entropy-based algorithm for categorical clustering
Proceedings of the eleventh international conference on Information and knowledge management
A study of smoothing methods for language models applied to information retrieval
ACM Transactions on Information Systems (TOIS)
Entropy-based criterion in categorical clustering
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Combining distributional and morphological information for part of speech induction
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Verbnet: a broad-coverage, comprehensive verb lexicon
Verbnet: a broad-coverage, comprehensive verb lexicon
Inducing syntactic categories by context distribution clustering
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Introduction to Information Retrieval
Introduction to Information Retrieval
An incremental bayesian model for learning syntactic categories
CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
GenERRate: generating errors for use in grammatical error detection
EdAppsNLP '09 Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications
ROBUS-UNSUP '12 Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP
Hierarchical clustering of word class distributions
WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Concurrent acquisition of word meaning and lexical categories
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.00 |
Children learn a robust representation of lexical categories at a young age. We propose an incremental model of this process which efficiently groups words into lexical categories based on their local context using an information-theoretic criterion. We train our model on a corpus of child-directed speech from CHILDES and show that the model learns a fine-grained set of intuitive word categories. Furthermore, we propose a novel evaluation approach by comparing the efficiency of our induced categories against other category sets (including traditional part of speech tags) in a variety of language tasks. We show the categories induced by our model typically outperform the other category sets.