Inside-outside reestimation from partially bracketed corpora

Authors:
Fernando Pereira;Yves Schabes
Affiliations:
AT&T Bell Laboratories, Murray Hill, NJ;University of Pennsylvania, Philadelphia, PA
Venue:
HLT '91 Proceedings of the workshop on Speech and Natural Language
Year:
1992

Citing 4
Cited 7

Procedure for quantitatively comparing the syntactic coverage of English grammars

HLT '91 Proceedings of the workshop on Speech and Natural Language
The ATIS spoken language systems pilot corpus

HLT '90 Proceedings of the workshop on Speech and Natural Language
Deducing linguistic structure from the statistics of large corpora

HLT '90 Proceedings of the workshop on Speech and Natural Language
Stochastic lexicalized tree-adjoining grammars

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2

Stochastic tree-adjoining grammars

HLT '91 Proceedings of the workshop on Speech and Natural Language
Automatically acquiring phrase structure using distributional analysis

HLT '91 Proceedings of the workshop on Speech and Natural Language
Integrated techniques for phrase extraction from speech

HLT '94 Proceedings of the workshop on Human Language Technology
Re-estimation of lexical parameters for treebank PCFGs

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Variational inference for grammar induction with prior knowledge

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Statistical language modeling combining N-gram and context-free grammars

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Simple unsupervised grammar induction from raw text with cascaded finite state models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

The inside-outside algorithm for inferring the parameters of a stochastic context-free grammar is extended to take advantage of constituent information in a partially parsed corpus. Experiments on formal and natural language parsed corpora show that the new algorithm can achieve faster convergence and better modelling of hierarchical structure than the original one. In particular, over 90% of the constituents in the most likely analyses of a test set are compatible with test set constituents for a grammar trained on a corpus of 700 hand-parsed part-of-speech strings for ATIS sentences.