Deducing linguistic structure from the statistics of large corpora
HLT '90 Proceedings of the workshop on Speech and Natural Language
Adaptation in natural and artificial systems
Adaptation in natural and artificial systems
Elements of information theory
Elements of information theory
Bayesian learning of probabilistic language models
Bayesian learning of probabilistic language models
Building probabilistic models for natural language
Building probabilistic models for natural language
Stochastic Complexity in Statistical Inquiry Theory
Stochastic Complexity in Statistical Inquiry Theory
The Unsupervised Acquisition of a Lexicon from Continuous Speech
The Unsupervised Acquisition of a Lexicon from Continuous Speech
Automatically acquiring phrase structure using distributional analysis
HLT '91 Proceedings of the workshop on Speech and Natural Language
A Mathematical Theory of Communication
A Mathematical Theory of Communication
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Hi-index | 0.00 |
This chapter describes our grammar induction work using the Minimum Description Length (MDL) principle. We start with a diagnostic comparison between a basic best-first MDL induction algorithm and a pseudo induction process, which reveals problems associated with the existing MDL-based grammar induction approach. Based on this, we present a novel two-stage grammar induction algorithm which overcomes a local-minimum problem in the basic algorithm by clustering the left hand sides of the induced grammar rules with a seed grammar. Preliminary experimental results show that the resulting induction curve significantly outperforms traditional MDL-based grammar induction, and in a diagnostic comparison is very close to the ideal case. In addition, the new algorithm induces grammar rules with high precision. Finally, we discuss our future research directions to improve both the recall and precision of the algorithm.