A formal approach to subgrammar extraction for NLP

Authors:
Vlado KešElj;Nick Cercone
Affiliations:
Faculty of Computer Science, Dalhousie University, Halifax, NS B3H 1W5, Canada;Faculty of Computer Science, Dalhousie University, Halifax, NS B3H 1W5, Canada
Venue:
Mathematical and Computer Modelling: An International Journal
Year:
2007

Citing 6
Cited 0

Efficient implementation of lattice operations

ACM Transactions on Programming Languages and Systems (TOPLAS)
The logic of typed feature structures

The logic of typed feature structures
Introduction To Automata Theory, Languages, And Computation

Introduction To Automata Theory, Languages, And Computation
Question Answering Using Unification-Based Grammar

AI '01 Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Applying explanation-based learning to control and speeding-up natural language generation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A bag of useful techniques for efficient and robust parsing

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Quantified Score

Hi-index	0.98

Visualization

Abstract

The problem of subgrammar extraction is precisely defined in the following way: Given a grammar G and a set of words W, find a smallest subgrammar of G that accepts the same set of sentences from W^* as G, and for each of them produces the same parse trees. In practical Natural Language Processing applications, the set of words W is obtained from the text unit. There are practical motivations for doing this operation ''just-in-time'', i.e. just before processing the text; hence it is required that this operation be done in an automatic and efficient way. After defining the problem in the general framework, we discuss the problem for context-free grammars (CFG), and give an efficient algorithm for it. We prove that finding the smallest subgrammar for HPSGs is an NP-hard problem, and give an efficient algorithm that solves an easier, approximate version of the problem. We also discuss how the algorithm can be efficiently implemented.