Prototype-driven grammar induction

Authors:
Aria Haghighi;Dan Klein
Affiliations:
University of California Berkeley;University of California Berkeley
Venue:
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Year:
2006

Citing 9
Cited 13

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Two Experiments on Learning Probabilistic Dependency Grammars from Corpora

Two Experiments on Learning Probabilistic Dependency Grammars from Corpora
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Inside-outside reestimation from partially bracketed corpora

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
A generative constituent-context model for improved grammar induction

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
The unsupervised learning of natural language structure

The unsupervised learning of natural language structure
Inducing syntactic categories by context distribution clustering

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Unsupervised induction of stochastic context-free grammars using distributional clustering

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Corpus-based induction of syntactic structure: models of dependency and constituency

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

Automatic selection of high quality parses created by a fully unsupervised parser

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Unsupervised induction of labeled parse trees by clustering with syntactic features

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Language ID in the context of harvesting language data off the web

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Semi-supervised learning of dependency parsers using generalized expectation criteria

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data

The Journal of Machine Learning Research
Wordica: Emergence of linguistic representations for words by independent component analysis

Natural Language Engineering
What's with the attitude?: identifying sentences with attitude in online discussions

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Parser evaluation over local and non-local deep dependencies in a large corpus

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Bootstrapping via graph propagation

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Syntactic transfer using a bilingual lexicon

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Semi-supervised constituent grammar induction based on text chunking information

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Smoothing for bracketing induction

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Bayesian Constituent Context Model for Grammar Induction

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate prototype-driven learning for primarily unsupervised grammar induction. Prior knowledge is specified declaratively, by providing a few canonical examples of each target phrase type. This sparse prototype information is then propagated across a corpus using distributional similarity features, which augment an otherwise standard PCFG model. We show that distributional features are effective at distinguishing bracket labels, but not determining bracket locations. To improve the quality of the induced trees, we combine our PCFG induction with the CCM model of Klein and Manning (2002), which has complementary stengths: it identifies brackets but does not label them. Using only a handful of prototypes, we show substantial improvements over naive PCFG induction for English and Chinese grammar induction.