Parsing with treebank grammars: empirical bounds, theoretical models, and the structure of the Penn Treebank

Authors:
Dan Klein;Christopher D. Manning
Affiliations:
Stanford University, Stanford, CA;Stanford University, Stanford, CA
Venue:
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Year:
2001

Citing 6
Cited 15

A recursive ascent Earley parser

Information Processing Letters
Natural language understanding (2nd ed.)

Natural language understanding (2nd ed.)
Foundations of statistical natural language processing

Foundations of statistical natural language processing
An efficient context-free parsing algorithm

Communications of the ACM
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Tree-bank grammars

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2

A parsing: fast exact Viterbi parse selection

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Parsing and hypergraphs

New developments in parsing technology
Arkose: reusing informal information from online discussions

Proceedings of the 2007 international ACM conference on Supporting group work
GRAEL: an agent-based evolutionary computing approach for natural language grammar development

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
How to keep your head above water while detecting errors

Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Stateful error detection in high throughput applications

Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Fast translation rule matching for syntax-based statistical machine translation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
How to keep your head above water while detecting errors

Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
Confidence measures for error discrimination in an interactive predictive parsing framework

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Unary constraints for efficient context-free parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Parsing noun phrases in the penn treebank

Computational Linguistics
Finite state grammar transduction from distributed collected knowledge

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Efficient matrix-encoded grammars and low latency parallelization strategies for CYK

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Parsing of partially bracketed structures for parse selection

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper presents empirical studies and closely corresponding theoretical models of the performance of a chart parser exhaustively parsing the Penn Treebank with the Treebank's own CFG grammar. We show how performance is dramatically affected by rule representation and tree transformations, but little by top-down vs. bottom-up strategies. We discuss grammatical saturation, including analysis of the strongly connected components of the phrasal nonterminals in the Treebank, and model how, as sentence length increases, the effective grammar rule size increases as regions of the grammar are unlocked, yielding super-cubic observed time behavior in some configurations.