Unsupervised part-of-speech tagging employing efficient graph clustering

Authors:
Chris Biemann
Affiliations:
University of Leipzig, Leipzig, Germany
Venue:
COLING ACL '06 Proceedings of the 21st International Conference on computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Year:
2006

Citing 5
Cited 19

A cluster algorithm for graphs

A cluster algorithm for graphs
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Distributional part-of-speech tagging

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Combining distributional and morphological information for part of speech induction

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Toward unsupervised whole-corpus tagging

COLING '04 Proceedings of the 20th international conference on Computational Linguistics

Limitations of current grammar induction algorithms

ACL '07 Proceedings of the 45th Annual Meeting of the ACL: Student Research Workshop
Evaluating unsupervised part-of-speech tagging for grammar induction

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
A graph-theoretic model of lexical syntactic acquisition

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using DEDICOM for completely unsupervised part-of-speech tagging

UMSLLS '09 Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics
Syntax is from Mars while semantics from Venus!: insights from spectral analysis of distributional similarity networks

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Improved unsupervised POS induction through prototype discovery

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Simple semi-supervised training of part-of-speech taggers

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Improvements in unsupervised co-occurrence based parsing

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Improved unsupervised POS induction using intrinsic clustering quality and a Zipfian constraint

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Two decades of unsupervised POS induction: how far have we come?

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Editorial: Network based models of cognitive and social dynamics of human languages

Computer Speech and Language
Semisupervised condensed nearest neighbor for part-of-speech tagging

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Investigating the Relationship Between Linguistic Representation and Computation through an Unsupervised Model of Human Morphology Learning

Research on Language and Computation
Factored translation with unsupervised word clusters

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Evaluating unsupervised learning for natural language processing tasks

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
A Bayesian mixture model for part-of-speech induction using multiple features

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Unsupervised part-of-speech disambiguation for high frequency words and its influence on unsupervised parsing

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Clustered word classes for preordering in statistical machine translation

ROBUS-UNSUP '12 Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP
Learning syntactic categories using paradigmatic representations of word context

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

An unsupervised part-of-speech (POS) tagging system that relies on graph clustering methods is described. Unlike in current state-of-the-art approaches, the kind and number of different tags is generated by the method itself. We compute and merge two partitionings of word graphs: one based on context similarity of high frequency words, another on log-likelihood statistics for words of lower frequencies. Using the resulting word clusters as a lexicon, a Viterbi POS tagger is trained, which is refined by a morphological component. The approach is evaluated on three different languages by measuring agreement with existing taggers.