Learning syntactic categories using paradigmatic representations of word context

Authors:
Mehmet Ali Yatbaz;Enis Sert;Deniz Yuret
Affiliations:
Koç University, Istanbul, Turkey;Koç University, Istanbul, Turkey;Koç University, Istanbul, Turkey
Venue:
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Year:
2012

Citing 20
Cited 0

Class-based n-gram models of natural language

Computational Linguistics
Tagging English text with a probabilistic model

Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Distributional part-of-speech tagging

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Combining distributional and morphological information for part of speech induction

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Prototype-driven learning for sequence models

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Euclidean Embedding of Co-occurrence Data

The Journal of Machine Learning Research
Unsupervised part-of-speech tagging employing efficient graph clustering

COLING ACL '06 Proceedings of the 21st International Conference on computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Painless unsupervised learning with features

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Phylogenetic grammar induction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
SVD and clustering for unsupervised POS tagging

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Posterior Regularization for Structured Latent Variable Models

The Journal of Machine Learning Research
Two decades of unsupervised POS induction: how far have we come?

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Latent-descriptor clustering for unsupervised POS induction

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Simple type-level unsupervised POS tagging

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A hierarchical Pitman-Yor process HMM for unsupervised part of speech induction

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A Bayesian mixture model for part-of-speech induction using multiple features

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
On the resolution of ambiguities in the extraction of syntactic categories through chunking

Cognitive Systems Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate paradigmatic representations of word context in the domain of unsupervised syntactic category acquisition. Paradigmatic representations of word context are based on potential substitutes of a word in contrast to syntagmatic representations based on properties of neighboring words. We compare a bigram based baseline model with several paradigmatic models and demonstrate significant gains in accuracy. Our best model based on Euclidean co-occurrence embedding combines the paradigmatic context representation with morphological and orthographic features and achieves 80% many-to-one accuracy on a 45-tag 1M word corpus.