Finite-state chart constraints for reduced complexity context-free parsing pipelines

Authors:
Brian Roark;Kristy Hollingshead;Nathan Bodenstab
Affiliations:
Oregon Health & Science University;University of Maryland;Oregon Health & Science University
Venue:
Computational Linguistics
Year:
2012

Citing 32
Cited 0

Procedure for quantitatively comparing the syntactic coverage of English grammars

HLT '91 Proceedings of the workshop on Speech and Natural Language
Learning to Parse Natural Language with Maximum Entropy Models

Machine Learning - Special issue on natural language learning
An efficient context-free parsing algorithm

Communications of the ACM
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
New figures of merit for best-first probabilistic chart parsing

Computational Linguistics
Supertagging: an approach to almost parsing

Computational Linguistics
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
The Penn Chinese TreeBank: Phrase structure annotation of a large corpus

Natural Language Engineering
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Programming languages and their compilers: Preliminary notes

Programming languages and their compilers: Preliminary notes
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
The importance of supertagging for wide-coverage CCG parsing

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Non-projective dependency parsing using spanning tree algorithms

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Comparing and combining finite-state and context-free parsers

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Speeding up full syntactic parsing by leveraging partial parsing decisions

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Vine parsing and minimum risk reranking for speed and precision

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Classifying chart cells for quadratic complexity context-free inference

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Better binarization for the CKY parsing

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning and inference for hierarchically split PCFGs

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Linear complexity context-free parsing pipelines via chart constraints

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Improving the efficiency of a wide-coverage CCG parser

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Parsing with soft and hard constraints on dependency length

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Using a maximum entropy-based tagger to improve a very fast vine parser

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
K-best combination of syntactic parsers

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Joint parsing and alignment with weakly synchronized grammars

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Fast and accurate arc filtering for dependency parsing

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Chart pruning for fast lexicalised-grammar parsing

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Beam-width prediction for efficient context-free parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Joint training of dependency parsing filters through latent support vector machines

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Unary constraints for efficient context-free parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Efficient matrix-encoded grammars and low latency parallelization strategies for CYK

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present methods for reducing the worst-case and typical-case complexity of a context-free parsing pipeline via hard constraints derived from finite-state pre-processing. We perform On predictions to determine if each word in the input sentence may begin or end a multi-word constituent in chart cells spanning two or more words, or allow single-word constituents in chart cells spanning the word itself. These pre-processing constraints prune the search space for any chart-based parsing algorithm and significantly decrease decoding time. In many cases cell population is reduced to zero, which we term chart cell "closing." We present methods for closing a sufficient number of chart cells to ensure provably quadratic or even linear worst-case complexity of context-free inference. In addition, we apply high precision constraints to achieve large typical-case speedups and combine both high precision and worst-case bound constraints to achieve superior performance on both short and long strings. These bounds on processing are achieved without reducing the parsing accuracy, and in some cases accuracy improves. We demonstrate that our method generalizes across multiple grammars and is complementary to other pruning techniques by presenting empirical results for both exact and approximate inference using the exhaustive CKY algorithm, the Charniak parser, and the Berkeley parser. We also report results parsing Chinese, where we achieve the best reported results for an individual model on the commonly reported data set.