Accurate parsing with compact tree-substitution grammars: Double-DOP

  • Authors:
  • Federico Sangati;Willem Zuidema

  • Affiliations:
  • University of Amsterdam, Amsterdam, The Netherlands;University of Amsterdam, Amsterdam, The Netherlands

  • Venue:
  • EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a novel approach to Data-Oriented Parsing (DOP). Like other DOP models, our parser utilizes syntactic fragments of arbitrary size from a treebank to analyze new sentences, but, crucially, it uses only those which are encountered at least twice. This criterion allows us to work with a relatively small but representative set of fragments, which can be employed as the symbolic backbone of several probabilistic generative models. For parsing we define a transform-backtransform approach that allows us to use standard PCFG technology, making our results easily replicable. According to standard Parseval metrics, our best model is on par with many state-of-the-art parsers, while offering some complementary benefits: a simple generative probability model, and an explicit representation of the larger units of grammar.