PLCFRS parsing of English discontinuous constituents

  • Authors:
  • Kilian Evang;Laura Kallmeyer

  • Affiliations:
  • University of Groningen;University of Düsseldorf

  • Venue:
  • IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a direct parsing of non-local dependencies in English. To this end, we use probabilistic linear context-free rewriting systems for data-driven parsing, following recent work on parsing German. In order to do so, we first perform a transformation of the Penn Treebank annotation of non-local dependencies into an annotation using crossing branches. The resulting treebank can be used for PLCFRS-based parsing. Our evaluation shows that, compared to PCFG parsing with the same techniques, PLCFRS parsing yields slightly better results. In particular when evaluating only the parsing results concerning long-distance dependencies, the PLCFRS approach with discontinuous constituents is able to recognize about 88% of the dependencies of type *T* and *T*-PRN encoded in the Penn Treebank. Even the evaluation results concerning local dependencies, which can in principle be captured by a PCFG-based model, are better with our PLCFRS model. This demonstrates that by discarding information on non-local dependencies the PCFG model loses important information on syntactic dependencies in general.