Unsupervised induction of stochastic context-free grammars using distributional clustering

Authors:
Alexander Clark
Affiliations:
University of Sussex, Brighton, United Kingdom
Venue:
ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Year:
2001

Citing 9
Cited 21

Elements of information theory

Elements of information theory
An efficient probabilistic context-free parsing algorithm that computes prefix probabilities

Computational Linguistics
Bayesian learning of probabilistic language models

Bayesian learning of probabilistic language models
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Two Experiments on Learning Probabilistic Dependency Grammars from Corpora

Two Experiments on Learning Probabilistic Dependency Grammars from Corpora
Bayesian grammar induction for language modeling

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Inside-outside reestimation from partially bracketed corpora

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
ABL: alignment-based learning

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Inducing syntactic categories by context distribution clustering

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7

A generative constituent-context model for improved grammar induction

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Learning deterministic context free grammars: The Omphalos competition

Machine Learning
Progressing the state-of-the-art in grammatical inference by competition: The Omphalos Context-Free Language Learning Competition

AI Communications
Corpus-based induction of syntactic structure: models of dependency and constituency

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
An all-subtrees approach to unsupervised parsing

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Prototype-driven grammar induction

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Prototype-driven learning for sequence models

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Unsupervised Learning of Probabilistic Context-Free Grammar using Iterative Biclustering

ICGI '08 Proceedings of the 9th international colloquium on Grammatical Inference: Algorithms and Applications
Unsupervised parsing with U-DOP

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Unsupervised grammar induction by distribution and attachment

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Towards full automation of lexicon construction

CLS '04 Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics
Evolutionary induction of stochastic context free grammars

Pattern Recognition
Natural language grammar induction with a generative constituent-context model

Pattern Recognition
Evolutionary computing as a tool for grammar development

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartI
Latent-descriptor clustering for unsupervised POS induction

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Unsupervised induction of tree substitution grammars for dependency parsing

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Inducing Tree-Substitution Grammars

The Journal of Machine Learning Research
A survey of grammatical inference methods for natural language learning

Artificial Intelligence Review
A comparative study on chinese word clustering

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Parser evaluation over local and non-local deep dependencies in a large corpus

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Bayesian Constituent Context Model for Grammar Induction

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

An algorithm is presented for learning a phrase-structure grammar from tagged text. It clusters sequences of tags together based on local distributional information, and selects clusters that satisfy a novel mutual information criterion. This criterion is shown to be related to the entropy of a random variable associated with the tree structures, and it is demonstrated that it selects linguistically plausible constituents. This is incorporated in a Minimum Description Length algorithm. The evaluation of unsupervised models is discussed, and results are presented when the algorithm has been trained on 12 million words of the British National Corpus.