A fast method for statistical grammar induction

Authors:
Wide R. Hogenhout;Yuji Matsumoto
Affiliations:
Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-01, Japan;Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-01, Japan
Venue:
Natural Language Engineering
Year:
1998

Citing 9
Cited 1

Incremental learning of concept descriptions: A method and experimental results

Machine intelligence 11
Procedure for quantitatively comparing the syntactic coverage of English grammars

HLT '91 Proceedings of the workshop on Speech and Natural Language
Generating a grammar for statistical training

HLT '90 Proceedings of the workshop on Speech and Natural Language
Elements of information theory

Elements of information theory
Class-based n-gram models of natural language

Computational Linguistics
From noise-free to noise-tolerant and from on-line to batch learning

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Training stochastic grammars on semantical categories

Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing
Parsing the Wall Street Journal with the inside-outside algorithm

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Inside-outside reestimation from partially bracketed corpora

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics

Variational bayesian grammar induction for natural language

ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The statistical induction of stochastic context free grammars from bracketed corpora with the Inside Outside Algorithm is an appealing method for grammar learning, but the computational complexity of this algorithm has made it impossible to generate a large scale grammar. Researchers from natural language processing and speech recognition have suggested various methods to reduce the computational complexity and, at the same time, guide the learning algorithm towards a solution by, for example, placing constraints on the grammar. We suggest a method that strongly reduces that computational cost of the algorithm without placing constraints on the grammar. This method can in principle be combined with any of the constraints on grammars that have been suggested in earlier studies. We show that it is feasible to achieve results equivalent to earlier research, but with much lower computational effort. After creating a small grammar, the grammar is incrementally increased while rules that have become obsolete are removed at the same time. We explain the modifications to the algorithm, give results of experiments and compare these to results reported in other publications.