Improvements in unsupervised co-occurrence based parsing

Authors:
Christian Hänig
Affiliations:
Daimler AG, Research and Technology, Ulm, Germany
Venue:
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Year:
2010

Citing 7
Cited 5

Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A generative constituent-context model for improved grammar induction

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
The unsupervised learning of natural language structure

The unsupervised learning of natural language structure
Corpus-based induction of syntactic structure: models of dependency and constituency

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
An all-subtrees approach to unsupervised parsing

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Unsupervised part-of-speech tagging employing efficient graph clustering

COLING ACL '06 Proceedings of the 21st International Conference on computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Simple unsupervised grammar induction from raw text with cascaded finite state models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Reducing the size of the representation for the uDOP-estimate

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Three dependency-and-boundary models for grammar induction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Smoothing for bracketing induction

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Bayesian Constituent Context Model for Grammar Induction

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an algorithm for unsupervised co-occurrence based parsing that improves and extends existing approaches. The proposed algorithm induces a context-free grammar of the language in question in an iterative manner. The resulting structure of a sentence will be given as a hierarchical arrangement of constituents. Although this algorithm does not use any a priori knowledge about the language, it is able to detect heads, modifiers and a phrase type's different compound composition possibilities. For evaluation purposes, the algorithm is applied to manually annotated part-of-speech tags (POS tags) as well as to word classes induced by an unsupervised part-of-speech tagger.