PCFG models of linguistic tree representations

Authors:
Mark Johnson
Affiliations:
Brown University
Venue:
Computational Linguistics
Year:
1998

Citing 12
Cited 126

Neural networks and the bias/variance dilemma

Neural Computation
Context-sensitive statistics for improved grammatical language models

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Empirical methods for artificial intelligence

Empirical methods for artificial intelligence
Statistical Language Learning

Statistical Language Learning
The Theory of Parsing, Translation, and Compiling

The Theory of Parsing, Translation, and Compiling
Structural ambiguity and lexical relations

Computational Linguistics - Special issue on using large corpora: I
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Estimation of probabilistic context-free grammars

Computational Linguistics
Pearl: a probabilistic chart parser

EACL '91 Proceedings of the fifth conference on European chapter of the Association for Computational Linguistics
A new statistical parser based on bigram lexical dependencies

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Tree-bank grammars

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Statistical parsing with a context-free grammar and word statistics

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

Fast context-free grammar parsing requires fast boolean matrix multiplication

Journal of the ACM (JACM)
Incremental Syntactic Parsing of Natural Language Corpora with Simple Synchrony Networks

IEEE Transactions on Knowledge and Data Engineering
PCFG Learning by Nonterminal Partition Search

ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
Tree k-Grammar Models for Natural Language Modelling and Parsing

Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Gibbsian Context-Free Grammar for Parsing

TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
Probabilistic top-down parsing and language modeling

Computational Linguistics
Statistical properties of probabilistic context-free grammars

Computational Linguistics
Robust garden path parsing

Natural Language Engineering
Do all fragments count?

Natural Language Engineering
Evaluating two methods for Treebank grammar compaction

Natural Language Engineering
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Probabilistic parsing and psychological plausibility

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Compact non-left-recursive grammars using the selective left-corner transform and factoring

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Support vector machine learning for interdependent and structured output spaces

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Efficient probabilistic top-down and left-corner parsing

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
An efficient implementation of a new DOP model

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Neural network probability estimation for broad coverage parsing

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Parsing with Probabilistic Strictly Locally Testable Tree Languages

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning grammars for different parsing tasks by partition search

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Recovering latent information in treebanks

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
What is the minimal set of fragments that achieves maximal parse accuracy?

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Joint and conditional estimation of tagging and parsing models

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A simple pattern-matching algorithm for recovering empty nodes and their antecedents

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A study on richer syntactic dependencies for structured language modeling

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Generative models for statistical parsing with Combinatory Categorial Grammar

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Inducing history representations for broad coverage statistical parsing

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A parsing: fast exact Viterbi parse selection

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Supervised and unsupervised PCFG adaptation to novel domains

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Is it harder to parse Chinese, or the Chinese Treebank?

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Statistical parsing with an automatically-extracted tree adjoining grammar

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
A comparison of PCFG models

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Translating treebank annotation for evaluation

ELDS '01 Proceedings of the workshop on Evaluation for Language and Dialogue Systems - Volume 9
Transformational priors over grammars

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Feature selection for a rich HPSG grammar using decision trees

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Semi-supervised learning for structured output variables

ICML '06 Proceedings of the 23rd international conference on Machine learning
Attention shifting for parsing speech

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Incremental parsing with the perceptron algorithm

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Long-distance dependency resolution in automatically acquired wide-coverage PCFG-based LFG approximations

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
An alternative method of training probabilistic LR parsers

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Probabilistic CFG with latent annotations

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
What to do when lexicalization fails: parsing German with suffix analysis and smoothing

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Trace prediction and recovery with unlexicalized PCFGs and slash features

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Graph transformations in data-driven dependency parsing

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Learning accurate, compact, and interpretable tree annotation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Annotation strategies for probabilistic parsing in German

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Morphology and reranking for the statistical parsing of Spanish

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Multilevel coarse-to-fine PCFG parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Relabeling syntax trees to improve syntax-based machine translation quality

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Probabilistic context-free grammar induction based on structural zeros

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Book review:

Computational Linguistics
Exploring the potential of intractable parsers

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Semantic parsing with structured SVM ensemble classification models

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Wide-coverage deep statistical parsing using automatic dependency structure annotation

Computational Linguistics
Dependency Parsing by Transformation and Combination

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
A unified syntactic model for parsing fluent and disfluent speech

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
A semiparametric generative model for efficient structured-output supervised learning

Annals of Mathematics and Artificial Intelligence
Non-local modeling with a mixture of PCFGs

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Re-estimation of lexical parameters for treebank PCFGs

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
A syntactic time-series model for parsing fluent and disfluent speech

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Toward a psycholinguistically-motivated model of language processing

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Relational-realizational parsing

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
A framework for incorporating alignment information in parsing

CrossLangInduction '06 Proceedings of the International Workshop on Cross-Language Knowledge Induction
Learning phrasal categories

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Better informed training of latent syntactic features

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Cutting-plane training of structural SVMs

Machine Learning
LTAG dependency parsing with bidirectional incremental construction

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Sparse multi-scale grammars for discriminative latent variable parsing

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Inducing compact but accurate tree-substitution grammars

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Improved syntactic models for parsing speech with repairs

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Parsing German with latent variable grammars

PaGe '08 Proceedings of the Workshop on Parsing German
Three-dimensional parametrization for parsing morphologically rich languages

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Factored A* search for models over sequences and trees

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Label correspondence learning for part-of-speech annotation transformation

Proceedings of the 18th ACM conference on Information and knowledge management
A classifier-based parser with linear run-time complexity

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Parsing speech repair without specialized grammar symbols

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Parsing algorithms based on tree automata

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Cross parser evaluation and tagset variation: a French treebank study

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Smoothing fine-grained PCFG lexicons

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Interactive predictive parsing

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Word buffering models for improved speech repair parsing

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
An alternative to head-driven approaches for parsing a (relatively) free word-order language

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Refining grammars for parsing with hierarchical semantic knowledge

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
On statistical parsing of French with supervised and semi-supervised strategies

CLAGI '09 Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference
Accurate and robust LFG-based generation for Chinese

INLG '08 Proceedings of the Fifth International Natural Language Generation Conference
Broad-coverage parsing using human-like memory constraints

Computational Linguistics
Natural language grammar induction with a generative constituent-context model

Pattern Recognition
Accurate unlexicalized parsing for modern Hebrew

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
On the representation of perceptual knowledge for understanding reference expressions

LKR'08 Proceedings of the 3rd international conference on Large-scale knowledge resources: construction and application
Products of random latent variable grammars

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Efficient third-order dependency parsers

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Dynamic programming for linear-time incremental parsing

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Simple, accurate parsing with an all-fragments grammar

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A cognitive interactionist sentence parser with simple recurrent networks

Information Sciences: an International Journal
Factors affecting the accuracy of Korean parsing

SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
Handling unknown words in statistical latent-variable parsing models for Arabic, English and French

SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
Head-modifier relation based non-lexical reordering model for phrase-based translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Phrase structure parsing with dependency structure

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Re-structuring, re-labeling, and re-aligning for syntax-based machine translation

Computational Linguistics
Incremental Sigmoid Belief Networks for Grammar Learning

The Journal of Machine Learning Research
Formal and empirical grammatical inference

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts of ACL 2011
Beam-width prediction for efficient context-free parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Rule Markov models for fast tree-to-string translation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Judging grammaticality with tree substitution grammar derivations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
The surprising variance in shortest-derivation parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Punctuation: making a point in unsupervised dependency parsing

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Parsing noun phrases in the penn treebank

Computational Linguistics
Inducing head-driven PCFGs with latent heads: refining a tree-bank grammar for parsing

ECML'05 Proceedings of the 16th European conference on Machine Learning
An overview of probabilistic tree transducers for natural language processing

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Hypotheses selection criteria in a reranking framework for spoken language understanding

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Linguistically-motivated grammar extraction, generalization and adaptation

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
A practical algorithm for intersecting weighted context-free grammars with finite-state automata

FSMNLP '11 Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing
Learning structural dependencies of words in the Zipfian tail

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Efficient matrix-encoded grammars and low latency parallelization strategies for CYK

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Morphological features for parsing morphologically-rich languages: a case of Arabic

SPMRL '11 Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages
Tree representations in probabilistic models for extended named entities detection

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Measuring efficiency in high-accuracy, broad-coverage statistical parsing

Proceedings of the COLING-2000 Workshop on Efficiency In Large-Scale Parsing Systems
Sequential vs. hierarchical syntactic models of human incremental sentence processing

CMCL '12 Proceedings of the 3rd Workshop on Cognitive Modeling and Computational Linguistics
Toward Tree Substitution Grammars with latent annotations

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Bayesian symbol-refined tree substitution grammars for syntactic parsing

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Large-scale syntactic language modeling with treelets

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
An information-theoretic measure to evaluate parsing difficulty across treebanks

ACM Transactions on Speech and Language Processing (TSLP)
Knowledge sources for constituent parsing of german, a morphologically rich and less-configurational language

Computational Linguistics
Word segmentation, unknown-word resolution, and morphological agreement in a hebrew parsing system

Computational Linguistics
Parsing models for identifying multiword expressions

Computational Linguistics
Statistical parsing with probabilistic symbol-refined tree substitution grammars

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The kinds of tree representations used in a treebank corpus can have a dramatic effect on performance of a parser based on the PCFG estimated from that corpus, causing the estimated likelihood of a tree to differ substantially from its frequency in the training corpus. This paper points out that the Penn II treebank representations are of the kind predicted to have such an effect, and describes a simple node relabeling transformation that improves a treebank PCFG-based parser's average precision and recall by around 8%, or approximately half of the performance difference between a simple PCFG model and the best broad-coverage parsers available today. This performance variation comes about because any PCFG, and hence the corpus of trees from which the PCFG is induced, embodies independence assumptions about the distribution of words and phrases. The particular independence assumptions implicit in a tree representation can be studied theoretically and investigated empirically by means of a tree transformation / detransformation process.