Accurate parsing with compact tree-substitution grammars: Double-DOP

Authors:
Federico Sangati;Willem Zuidema
Affiliations:
University of Amsterdam, Amsterdam, The Netherlands;University of Amsterdam, Amsterdam, The Netherlands
Venue:
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2011

Citing 22
Cited 2

Squibs and discussions: the DOP Estimation method is biased and inconsistent

Computational Linguistics
Data-Oriented Parsing

Data-Oriented Parsing
Parsing inside-out

Parsing inside-out
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A computational model of language performance: Data Oriented Parsing

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
Computational complexity of probabilistic disambiguation by means of tree-grammars

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
An efficient implementation of a new DOP model

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
What is the minimal set of fragments that achieves maximal parse accuracy?

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Probabilistic CFG with latent annotations

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Learning accurate, compact, and interpretable tree annotation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Unsupervised methods for head assignments

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Bayesian learning of a tree substitution grammar

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Bayesian synchronous tree-substitution grammar induction and its application to sentence compression

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Simple, accurate parsing with an all-fragments grammar

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Statistical parsing with a context-free grammar and word statistics

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Coarse-to-fine natural language processing

Coarse-to-fine natural language processing
Inducing Tree-Substitution Grammars

The Journal of Machine Learning Research
Efficient convolution kernels for dependency and constituent syntactic trees

ECML'06 Proceedings of the 17th European conference on Machine Learning

Discontinuous data-oriented parsing: a mildly context-sensitive all-fragments grammar

SPMRL '11 Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages
Native language detection with tree substitution grammars

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel approach to Data-Oriented Parsing (DOP). Like other DOP models, our parser utilizes syntactic fragments of arbitrary size from a treebank to analyze new sentences, but, crucially, it uses only those which are encountered at least twice. This criterion allows us to work with a relatively small but representative set of fragments, which can be employed as the symbolic backbone of several probabilistic generative models. For parsing we define a transform-backtransform approach that allows us to use standard PCFG technology, making our results easily replicable. According to standard Parseval metrics, our best model is on par with many state-of-the-art parsers, while offering some complementary benefits: a simple generative probability model, and an explicit representation of the larger units of grammar.