Class-based n-gram models of natural language
Computational Linguistics
Coping with ambiguity and unknown words through probabilistic models
Computational Linguistics - Special issue on using large corpora: II
Tagging English text with a probabilistic model
Computational Linguistics
Automatic rule induction for unknown-word guessing
Computational Linguistics
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Does Baum-Welch re-estimation help taggers?
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A practical part-of-speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Distributional part-of-speech tagging
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Combining distributional and morphological information for part of speech induction
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Inducing syntactic categories by context distribution clustering
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
A practical solution to the problem of automatic part-of-speech induction from text
ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
Deriving an ambiguous word's part-of-speech distribution from unannotated text
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Unsupervised part-of-speech tagging employing efficient graph clustering
COLING ACL '06 Proceedings of the 21st International Conference on computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Evaluating unsupervised part-of-speech tagging for grammar induction
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Improved unsupervised POS induction through prototype discovery
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Unsupervised Part-of-Speech Tagging in the Large
Research on Language and Computation
Improved unsupervised POS induction using intrinsic clustering quality and a Zipfian constraint
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Research on Language and Computation
Controlling complexity in part-of-speech induction
Journal of Artificial Intelligence Research
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
ROBUS-UNSUP '12 Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP
Hi-index | 0.00 |
We present a system for unsupervised tagging of words into classes produced by a distributional clustering technique called co-clustering. A hidden Markov model (HMM), trained on the high-frequency terms in the lexicon, is used to "tag" occurrences of low-frequency terms. In experiments using the Wall Street Journal portion of the Penn Treebank, we show that previously reported problems in using Baum-Welch estimation for part-of-speech tagging do not occur in this context. We also show how state-level term emission models can be augmented to account for morphological patterns using features automatically derived from the output of co-clustering. Finally, we consider an alternative means of extending the coverage of the lexicon, in which low-frequency terms are added to the lexicon as types, and compare this approach with the token-level assignments made by the HMM.