Inside-outside reestimation from partially bracketed corpora

Authors:
Fernando Pereira;Yves Schabes
Affiliations:
2D-447, AT& T Bell Laboratories, Murray Hill, NJ;University of Pennsylvania, Philadelphia, PA
Venue:
ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Year:
1992

Citing 5
Cited 102

Self-organized language modeling for speech recognition

Readings in speech recognition
Procedure for quantitatively comparing the syntactic coverage of English grammars

HLT '91 Proceedings of the workshop on Speech and Natural Language
The ATIS spoken language systems pilot corpus

HLT '90 Proceedings of the workshop on Speech and Natural Language
Deducing linguistic structure from the statistics of large corpora

HLT '90 Proceedings of the workshop on Speech and Natural Language
Stochastic lexicalized tree-adjoining grammars

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2

An efficient probabilistic context-free parsing algorithm that computes prefix probabilities

Computational Linguistics
DCG induction using MDL and parsed corpora

Learning language in logic
Natural Language Grammatical Inference with Recurrent Neural Networks

IEEE Transactions on Knowledge and Data Engineering
The Emergence of Artificial Creole by the EM Algorithm

DS '02 Proceedings of the 5th International Conference on Discovery Science
Generalized probabilistic LR parsing of natural language (Corpora) with unification-based grammars

Computational Linguistics - Special issue on using large corpora: I
From grammar to lexicon: unsupervised learning of lexical syntax

Computational Linguistics - Special issue on using large corpora: II
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora

Computational Linguistics
Do all fragments count?

Natural Language Engineering
Acquisitions and applications of structure preference relations in Chinese

Natural Language Engineering
A reestimation algorithm for probabilistic dependency grammars

Natural Language Engineering
A fast method for statistical grammar induction

Natural Language Engineering
The DINOUS parser

Natural Language Engineering
Using an annotated corpus as a stochastic grammar

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Parsing the Wall Street Journal with the inside-outside algorithm

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
A DOP model for semantic interpretation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Spoken dialogue interpretation with the DOP model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
An empirical evaluation of Probabilistic Lexicalized Tree Insertion Grammars

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Automatic grammar induction and parsing free text: a transformation-based approach

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Bayesian grammar induction for language modeling

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
An algorithm for simultaneously bracketing parallel texts by aligning words

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Hidden understanding models of natural language

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Parsing algorithms and metrics

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
A new statistical parser based on bigram lexical dependencies

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Linguistic structure as composition and perturbation

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Combination of n-grams and Stochastic Context-Free Grammars for language modeling

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Stochastic lexicalized tree-adjoining grammars

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Towards a more careful evaluation of broad coverage parsing systems

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
ABL: alignment-based learning

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Supervised grammar induction using training data with limited constituent information

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Introduction to the special issue on statistical language modeling

ACM Transactions on Asian Language Information Processing (TALIP)
A hybrid language model based on a combination of N-grams and stochastic context-free grammars

ACM Transactions on Asian Language Information Processing (TALIP)
An efficient implementation of a new DOP model

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Parsing with Probabilistic Strictly Locally Testable Tree Languages

IEEE Transactions on Pattern Analysis and Machine Intelligence
Grammatical Inference in Bioinformatics

IEEE Transactions on Pattern Analysis and Machine Intelligence
A stochastic parser based on an SLM with arboreal context trees

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Improvement of a Whole Sentence Maximum Entropy Language Model using grammatical features

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Automatic detection of syllable boundaries combining the advantages of treebank and bracketed corpora training

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A generative constituent-context model for improved grammar induction

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Parsing the wall street journal using a Lexical-Functional Grammar and discriminative estimation techniques

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Supervised and unsupervised PCFG adaptation to novel domains

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Lexicalized stochastic modeling of constraint-based grammars using log-linear measures and EM training

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Measures and models for phrase recognition

HLT '93 Proceedings of the workshop on Human Language Technology
Automatic grammar induction and parsing free text: a transformation-based approach

HLT '93 Proceedings of the workshop on Human Language Technology
Constrained EM for parallel text alignment

Natural Language Engineering
Sample Selection for Statistical Parsing

Computational Linguistics
Sample selection for statistical grammar induction

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
On minimizing training corpus for parser acquisition

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Unsupervised induction of stochastic context-free grammars using distributional clustering

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Corpus-based induction of syntactic structure: models of dependency and constituency

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Dyna: a declarative language for implementing dynamic programs

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Probabilistic CFG with latent annotations

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Learning accurate, compact, and interpretable tree annotation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Prototype-driven grammar induction

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Compiling Comp Ling: practical weighted dynamic programming and the Dyna language

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Partial training for a lexicalized-grammar parser

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
From ubgs to cfgs a practical corpus-driven approach

Natural Language Engineering
Efficient Incremental Model for Learning Context-Free Grammars from Positive Structural Examples

SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
Book review:

Computational Linguistics
Improving Metrical Grammar with Grammar Expansion

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Unlexicalised hidden variable models of split dependency grammars

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
History-Based Inside-Outside Algorithm

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Data-driven dependency parsing of new languages using incomplete and noisy training data

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Modeling the language assessment process and result: proposed architecture for automatic oral proficiency assessment

ASSESSEVALNLP '99 Proceedings of a Symposium on Computer Mediated Language Assessment and Evaluation in Natural Language Processing
Training conditional random fields using incomplete annotations

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Cube summing, approximate inference with non-local features, and dynamic programming without semirings

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Better informed training of latent syntactic features

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Learning and inference for hierarchically split PCFGs

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Semi-supervised training of a statistical parser from unlabeled partially-bracketed data

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Richness of the base and probabilistic unsupervised learning in optimality theory

SIGPHON '06 Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology and Morphology
Parameter learning of logic programs for symbolic-statistical modeling

Journal of Artificial Intelligence Research
A domain-specific statistical surface realizer

ACLstudent '05 Proceedings of the ACL Student Research Workshop
Stochastic inversion transduction grammars with application to segmentation, bracketing, and alignment of parallel corpora

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Estimation of stochastic context-free grammars and their use as language models

Computer Speech and Language
MAP adaptation of stochastic grammars

Computer Speech and Language
Stochastic inversion transduction grammars for obtaining word phrases for phrase-based statistical machine translation

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Feature-rich translation by quasi-synchronous lattice parsing

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Natural language grammar induction with a generative constituent-context model

Pattern Recognition
Smoothing and compression with stochastic k-testable tree languages

Pattern Recognition
Profiting from mark-up: hyper-text annotations for guided parsing

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Tree-bank grammars

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Learning to parse database queries using inductive logic programming

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Using an annotated language corpus as a virtual stochastic grammar

AAAI'93 Proceedings of the eleventh national conference on Artificial intelligence
Viterbi training improves unsupervised dependency parsing

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Covariance in Unsupervised Learning of Probabilistic Grammars

The Journal of Machine Learning Research
Dependency syntax analysis using grammar induction and a lexical categories precedence system

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Products of weighted logic programs

Theory and Practice of Logic Programming
Punctuation: making a point in unsupervised dependency parsing

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Time reduction of stochastic parsing with stochastic context-free grammars

IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
Performance of a SCFG-based language model with training data sets of increasing size

IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
Reducing the size of the representation for the uDOP-estimate

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Inducing sentence structure from parallel corpora for reordering

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Lateen EM: unsupervised training with multiple objectives, applied to dependency grammar induction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using a partially annotated corpus to build a dependency parser for japanese

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Variational bayesian grammar induction for natural language

ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Parsing of partially bracketed structures for parse selection

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Building a hierarchical annotated corpus of urdu: the URDU.KON-TB treebank

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Capitalization cues improve dependency grammar induction

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Rediscovering ACL discoveries through the lens of ACL anthology network citing sentences

ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Spectral learning of latent-variable PCFGs

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A feature-rich constituent context model for grammar induction

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Three dependency-and-boundary models for grammar induction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The inside-outside algorithm for inferring the parameters of a stochastic context-free grammar is extended to take advantage of constituent information (constituent bracketing) in a partially parsed corpus. Experiments on formal and natural language parsed corpora show that the new algorithm can achieve faster convergence and better modeling of hierarchical structure than the original one. In particular, over 90% test set bracketing accuracy was achieved for grammars inferred by our algorithm from a training set of handparsed part-of-speech strings for sentences in the Air Travel Information System spoken language corpus. Finally, the new algorithm has better time complexity than the original one when sufficient bracketing is provided.