Incremental Generation of Parsers
IEEE Transactions on Software Engineering
Minimization algorithms for sequential transducers
Theoretical Computer Science
An efficient context-free parsing algorithm
Communications of the ACM
Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems
Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems
Theory of Syntactic Recognition for Natural Languages
Theory of Syntactic Recognition for Natural Languages
Determinization of transducers over finite and infinite words
Theoretical Computer Science
Generalized probabilistic LR parsing of natural language (Corpora) with unification-based grammars
Computational Linguistics - Special issue on using large corpora: I
Learning parse and translation decisions from examples with rich context
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A probabilistic earley parser as a psycholinguistic model
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Hi-index | 0.00 |
We define a new learning task, minimum average lookahead grammar induction, with strong potential implications for incremental parsing in NLP and cognitive models. Our thesis is that a suitable learning bias for grammar induction is to minimize the degree of lookahead required, on the underlying tenet that language evolution drove grammars to be efficiently parsable in incremental fashion. The input to the task is an unannotated corpus, plus a non-deterministic constraining grammar that serves as an abstract model of environmental constraints confirming or rejecting potential parses. The constraining grammar typically allows ambiguity and is itself poorly suited for an incremental parsing model, since it gives rise to a high degree of nondeterminism in parsing. The learning task, then, is to induce a deterministic LR (k) grammar under which it is possible to incrementally construct one of the correct parses for each sentence in the corpus, such that the average degree of lookahead needed to do so is minimized. This is a significantly more difficult optimization problem than merely compiling LR (k) grammars, since k is not specified in advance. Clearly, naïve approaches to this optimization can easily be computationally infeasible. However, by making combined use of GLR ancestor tables and incremental LR table construction methods, we obtain an O(n3 + 2m) greedy approximation algorithm for this task that is quite efficient in practice.