An efficient algorithm to induce minimum average lookahead grammars for incremental LR parsing

Authors:
Dekai Wu;Yihai Shen
Affiliations:
University of Science and Technology, Clear Water Bay, Hong Kong;University of Science and Technology, Clear Water Bay, Hong Kong
Venue:
IncrementParsing '04 Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together
Year:
2004

Citing 9
Cited 0

Incremental Generation of Parsers

IEEE Transactions on Software Engineering
Minimization algorithms for sequential transducers

Theoretical Computer Science
An efficient context-free parsing algorithm

Communications of the ACM
Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems

Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems
Theory of Syntactic Recognition for Natural Languages

Theory of Syntactic Recognition for Natural Languages
Determinization of transducers over finite and infinite words

Theoretical Computer Science
Generalized probabilistic LR parsing of natural language (Corpora) with unification-based grammars

Computational Linguistics - Special issue on using large corpora: I
Learning parse and translation decisions from examples with rich context

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A probabilistic earley parser as a psycholinguistic model

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

We define a new learning task, minimum average lookahead grammar induction, with strong potential implications for incremental parsing in NLP and cognitive models. Our thesis is that a suitable learning bias for grammar induction is to minimize the degree of lookahead required, on the underlying tenet that language evolution drove grammars to be efficiently parsable in incremental fashion. The input to the task is an unannotated corpus, plus a non-deterministic constraining grammar that serves as an abstract model of environmental constraints confirming or rejecting potential parses. The constraining grammar typically allows ambiguity and is itself poorly suited for an incremental parsing model, since it gives rise to a high degree of nondeterminism in parsing. The learning task, then, is to induce a deterministic LR (k) grammar under which it is possible to incrementally construct one of the correct parses for each sentence in the corpus, such that the average degree of lookahead needed to do so is minimized. This is a significantly more difficult optimization problem than merely compiling LR (k) grammars, since k is not specified in advance. Clearly, naïve approaches to this optimization can easily be computationally infeasible. However, by making combined use of GLR ancestor tables and incremental LR table construction methods, we obtain an O(n3 + 2m) greedy approximation algorithm for this task that is quite efficient in practice.