An efficient algorithm to induce minimum average lookahead grammars for incremental LR parsing

  • Authors:
  • Dekai Wu;Yihai Shen

  • Affiliations:
  • University of Science and Technology, Clear Water Bay, Hong Kong;University of Science and Technology, Clear Water Bay, Hong Kong

  • Venue:
  • IncrementParsing '04 Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We define a new learning task, minimum average lookahead grammar induction, with strong potential implications for incremental parsing in NLP and cognitive models. Our thesis is that a suitable learning bias for grammar induction is to minimize the degree of lookahead required, on the underlying tenet that language evolution drove grammars to be efficiently parsable in incremental fashion. The input to the task is an unannotated corpus, plus a non-deterministic constraining grammar that serves as an abstract model of environmental constraints confirming or rejecting potential parses. The constraining grammar typically allows ambiguity and is itself poorly suited for an incremental parsing model, since it gives rise to a high degree of nondeterminism in parsing. The learning task, then, is to induce a deterministic LR (k) grammar under which it is possible to incrementally construct one of the correct parses for each sentence in the corpus, such that the average degree of lookahead needed to do so is minimized. This is a significantly more difficult optimization problem than merely compiling LR (k) grammars, since k is not specified in advance. Clearly, naïve approaches to this optimization can easily be computationally infeasible. However, by making combined use of GLR ancestor tables and incremental LR table construction methods, we obtain an O(n3 + 2m) greedy approximation algorithm for this task that is quite efficient in practice.