A formal approach to subgrammar extraction for NLP

  • Authors:
  • Vlado KešElj;Nick Cercone

  • Affiliations:
  • Faculty of Computer Science, Dalhousie University, Halifax, NS B3H 1W5, Canada;Faculty of Computer Science, Dalhousie University, Halifax, NS B3H 1W5, Canada

  • Venue:
  • Mathematical and Computer Modelling: An International Journal
  • Year:
  • 2007

Quantified Score

Hi-index 0.98

Visualization

Abstract

The problem of subgrammar extraction is precisely defined in the following way: Given a grammar G and a set of words W, find a smallest subgrammar of G that accepts the same set of sentences from W^* as G, and for each of them produces the same parse trees. In practical Natural Language Processing applications, the set of words W is obtained from the text unit. There are practical motivations for doing this operation ''just-in-time'', i.e. just before processing the text; hence it is required that this operation be done in an automatic and efficient way. After defining the problem in the general framework, we discuss the problem for context-free grammars (CFG), and give an efficient algorithm for it. We prove that finding the smallest subgrammar for HPSGs is an NP-hard problem, and give an efficient algorithm that solves an easier, approximate version of the problem. We also discuss how the algorithm can be efficiently implemented.