Bounding the maximal parsing performance of non-terminally separated grammars

  • Authors:
  • Franco M. Luque;Gabriel Infante-Lopez

  • Affiliations:
  • Universidad Nacional de Córdoba & Conicet, Córdoba, Argentina;Universidad Nacional de Córdoba & Conicet, Córdoba, Argentina

  • Venue:
  • ICGI'10 Proceedings of the 10th international colloquium conference on Grammatical inference: theoretical results and applications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Unambiguous Non-Terminally Separated (UNTS) grammars have good learnability properties but are too restrictive to be used for natural language parsing. We present a generalization of UNTS grammars called Unambiguous Weakly NTS (UWNTS) grammars that preserve the learnability properties. Then, we study the problem of using them to parse natural language and evaluating against a gold treebank. If the target language is not UWNTS, there will be an upper bound in the parsing performance. In this paper we develop methods to find upper bounds for the unlabeled F1 performance that any UWNTS grammar can achieve over a given treebank. We define a new metric, show that its optimization is NP-Hard but solvable with specialized software, and show a translation of the result to a bound for the F1. We do experiments with the WSJ10 corpus, finding an F1 bound of 76.1% for the UWNTS grammars over the POS tags alphabet.