DCG induction using MDL and parsed corpora

Authors:
Miles Osborne
Affiliations:
Univ. of Groningen, Groningen, The Netherlands
Venue:
Learning language in logic
Year:
2001

Citing 20
Cited 1

Occam's razor

Information Processing Letters
Inferring decision trees using the minimum description length principle

Information and Computation
Inductive inference from positive data is powerful

COLT '90 Proceedings of the third annual workshop on Computational learning theory
Overfitting Avoidance as Bias

Machine Learning
Efficient learning of context-free grammars from positive structural examples

Information and Computation
Natural language parsing as statistical pattern recognition

Natural language parsing as statistical pattern recognition
Bayesian learning of probabilistic language models

Bayesian learning of probabilistic language models
Building probabilistic models for natural language

Building probabilistic models for natural language
An introduction to Kolmogorov complexity and its applications (2nd ed.)

An introduction to Kolmogorov complexity and its applications (2nd ed.)
Inference of Reversible Languages

Journal of the ACM (JACM)
Stochastic Complexity in Statistical Inquiry Theory

Stochastic Complexity in Statistical Inquiry Theory
Inducing Probabilistic Grammars by Bayesian Model Merging

ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
Unsupervised language acquisition

Unsupervised language acquisition
Stochastic attribute-value grammars

Computational Linguistics
Towards history-based grammars: using richer models for probabilistic parsing

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Relating complexity to practical performance in parsing with wide-coverage unification grammars

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A new statistical parser based on bigram lexical dependencies

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Inside-outside reestimation from partially bracketed corpora

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Estimation of stochastic attribute-value grammars using an informative sample

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Probabilistic representation of formal languages

SWAT '69 Proceedings of the 10th Annual Symposium on Switching and Automata Theory (swat 1969)

Issues in Learning Language in Logic

Computational Logic: Logic Programming and Beyond, Essays in Honour of Robert A. Kowalski, Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

We show how partial models of natural language syntax (manually written DCGs, with parameters estimated from a parsed corpus) can be automatically extended when trained upon raw text (using MDL). We also show how we can use a parsed corpus as an alternative constraint upon learning. Empirical evaluation suggests that a parsed corpus is more informative than a MDL-based prior. However, best results are achieved when the learner is supervised with a compressionbased prior and a parsed corpus.