Squibs and discussions: the DOP Estimation method is biased and inconsistent
Computational Linguistics
Data-Oriented Parsing
Parsing inside-out
Automatic grammar generation from two different perspectives
Automatic grammar generation from two different perspectives
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Tagging English text with a probabilistic model
Computational Linguistics
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
An efficient implementation of a new DOP model
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Recovering latent information in treebanks
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A generative constituent-context model for improved grammar induction
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Unsupervised induction of stochastic context-free grammars using distributional clustering
ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Corpus-based induction of syntactic structure: models of dependency and constituency
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Learning accurate, compact, and interpretable tree annotation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Contextual dependencies in unsupervised word segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
An all-subtrees approach to unsupervised parsing
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Improving unsupervised dependency parsing with richer contexts and smoothing
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Inducing compact but accurate tree-substitution grammars
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Unsupervised word segmentation for Sesotho using Adaptor Grammars
SigMorPhon '08 Proceedings of the Tenth Meeting of ACL Special Interest Group on Computational Morphology and Phonology
Bayesian learning of a tree substitution grammar
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Products of random latent variable grammars
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Variational inference for adaptor grammars
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
From baby steps to Leapfrog: how "Less is More" in unsupervised dependency parsing
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Blocked inference in Bayesian tree substitution grammars
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Using an annotated language corpus as a virtual stochastic grammar
AAAI'93 Proceedings of the eleventh national conference on Artificial intelligence
Unsupervised induction of tree substitution grammars for dependency parsing
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Unsupervised induction of tree substitution grammars for dependency parsing
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A hierarchical Pitman-Yor process HMM for unsupervised part of speech induction
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Insertion operator for Bayesian tree substitution grammars
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
The Journal of Machine Learning Research
Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models
The Journal of Machine Learning Research
Accurate parsing with compact tree-substitution grammars: Double-DOP
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Bayesian induction of syntactic language models for brazilian portuguese
PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Stylometric analysis of scientific articles
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Empiricist solutions to nativist puzzles by means of unsupervised TSG
Proceedings of the Workshop on Computational Models of Language Acquisition and Loss
Judging grammaticality with count-induced tree substitution grammars
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Capitalization cues improve dependency grammar induction
WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Toward Tree Substitution Grammars with latent annotations
WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Induction of linguistic structure with combinatory categorial grammars
WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Bayesian symbol-refined tree substitution grammars for syntactic parsing
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Parsing morphologically rich languages: Introduction to the special issue
Computational Linguistics
Parsing models for identifying multiword expressions
Computational Linguistics
Smoothing for bracketing induction
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Statistical parsing with probabilistic symbol-refined tree substitution grammars
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Bayesian Constituent Context Model for Grammar Induction
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Hi-index | 0.00 |
Inducing a grammar from text has proven to be a notoriously challenging learning task despite decades of research. The primary reason for its difficulty is that in order to induce plausible grammars, the underlying model must be capable of representing the intricacies of language while also ensuring that it can be readily learned from data. The majority of existing work on grammar induction has favoured model simplicity (and thus learnability) over representational capacity by using context free grammars and first order dependency grammars, which are not sufficiently expressive to model many common linguistic constructions. We propose a novel compromise by inferring a probabilistic tree substitution grammar, a formalism which allows for arbitrarily large tree fragments and thereby better represent complex linguistic structures. To limit the model's complexity we employ a Bayesian non-parametric prior which biases the model towards a sparse grammar with shallow productions. We demonstrate the model's efficacy on supervised phrase-structure parsing, where we induce a latent segmentation of the training treebank, and on unsupervised dependency grammar induction. In both cases the model uncovers interesting latent linguistic structures while producing competitive results.