Hierarchical clustering of word class distributions

Authors:
Grzegorz Chrupała
Affiliations:
Saarland University
Venue:
WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Year:
2012

Citing 11
Cited 1

Class-based n-gram models of natural language

Computational Linguistics
Latent dirichlet allocation

The Journal of Machine Learning Research
An incremental bayesian model for learning syntactic categories

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Minimized models for unsupervised part-of-speech tagging

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Phrase clustering for discriminative learning

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Word representations: a simple and general method for semi-supervised learning

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
SVD and clustering for unsupervised POS tagging

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Online entropy-based model of lexical category acquisition

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Simple type-level unsupervised POS tagging

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A Bayesian mixture model for part-of-speech induction using multiple features

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Divergence measures based on the Shannon entropy

IEEE Transactions on Information Theory

The PASCAL Challenge on Grammar Induction

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose an unsupervised approach to POS tagging where first we associate each word type with a probability distribution over word classes using Latent Dirichlet Allocation. Then we create a hierarchical clustering of the word types: we use an agglomerative clustering algorithm where the distance between clusters is defined as the Jensen-Shannon divergence between the probability distributions over classes associated with each word-type. When assigning POS tags, we find the tree leaf most similar to the current word and use the prefix of the path leading to this leaf as the tag. This simple labeler outperforms a baseline based on Brown clusters on 9 out of 10 datasets.