A cluster algorithm for graphs
A cluster algorithm for graphs
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
Distributional part-of-speech tagging
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Combining distributional and morphological information for part of speech induction
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Toward unsupervised whole-corpus tagging
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Limitations of current grammar induction algorithms
ACL '07 Proceedings of the 45th Annual Meeting of the ACL: Student Research Workshop
Evaluating unsupervised part-of-speech tagging for grammar induction
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
A graph-theoretic model of lexical syntactic acquisition
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using DEDICOM for completely unsupervised part-of-speech tagging
UMSLLS '09 Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Improved unsupervised POS induction through prototype discovery
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Simple semi-supervised training of part-of-speech taggers
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Improvements in unsupervised co-occurrence based parsing
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Improved unsupervised POS induction using intrinsic clustering quality and a Zipfian constraint
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Two decades of unsupervised POS induction: how far have we come?
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Editorial: Network based models of cognitive and social dynamics of human languages
Computer Speech and Language
Semisupervised condensed nearest neighbor for part-of-speech tagging
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Research on Language and Computation
Factored translation with unsupervised word clusters
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Evaluating unsupervised learning for natural language processing tasks
EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
A Bayesian mixture model for part-of-speech induction using multiple features
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Clustered word classes for preordering in statistical machine translation
ROBUS-UNSUP '12 Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP
Learning syntactic categories using paradigmatic representations of word context
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.00 |
An unsupervised part-of-speech (POS) tagging system that relies on graph clustering methods is described. Unlike in current state-of-the-art approaches, the kind and number of different tags is generated by the method itself. We compute and merge two partitionings of word graphs: one based on context similarity of high frequency words, another on log-likelihood statistics for words of lower frequencies. Using the resulting word clusters as a lexicon, a Viterbi POS tagger is trained, which is refined by a morphological component. The approach is evaluated on three different languages by measuring agreement with existing taggers.