Unsupervised induction of labeled parse trees by clustering with syntactic features

Authors:
Roi Reichart;Ari Rappoport
Affiliations:
Hebrew University of Jerusalem;Hebrew University of Jerusalem
Venue:
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Year:
2008

Citing 18
Cited 6

Bayesian learning of probabilistic language models

Bayesian learning of probabilistic language models
Incremental Learning of Context Free Grammars

ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
Inducing Probabilistic Grammars by Bayesian Model Merging

ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
A minimum description length approach to grammar inference

Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing
Memory-Based Lexical Acquisition and Processing

Proceedings of the Third International EAMT Workshop on Machine Translation and the Lexicon
Unsupervised language acquisition

Unsupervised language acquisition
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Bayesian grammar induction for language modeling

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Combining distributional and morphological information for part of speech induction

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Building a large-scale annotated Chinese corpus

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A generative constituent-context model for improved grammar induction

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
The unsupervised learning of natural language structure

The unsupervised learning of natural language structure
Corpus-based induction of syntactic structure: models of dependency and constituency

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Annealing structural bias in multilingual weighted grammar induction

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
An all-subtrees approach to unsupervised parsing

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Prototype-driven grammar induction

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
On coreference resolution performance metrics

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Unsupervised parsing with U-DOP

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning

Automatic selection of high quality parses created by a fully unsupervised parser

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
The NVI clustering evaluation measure

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Unsupervised methods for head assignments

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Improved unsupervised POS induction through prototype discovery

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Type level clustering evaluation: new measures and a POS induction case study

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
treeKL: A distance between high dimension empirical distributions

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an algorithm for unsupervised induction of labeled parse trees. The algorithm has three stages: bracketing, initial labeling, and label clustering. Bracketing is done from raw text using an unsupervised incremental parser. Initial labeling is done using a merging model that aims at minimizing the grammar description length. Finally, labels are clustered to a desired number of labels using syntactic features extracted from the initially labeled trees. The algorithm obtains 59% labeled f-score on the WSJ10 corpus, as compared to 35% in previous work, and substantial error reduction over a random baseline. We report results for English, German and Chinese corpora, using two label mapping methods and two label set sizes.