Semantics and algorithms for data-dependent grammars

Authors:
Trevor Jim;Yitzhak Mandelbaum;David Walker
Affiliations:
AT&T Labs - Research, Florham Park, NJ, USA;AT&T Labs - Research, Florham Park, NJ, USA;Princeton University, Princeton, NJ, USA
Venue:
Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Year:
2010

Citing 20
Cited 9

Scannerless NSLR(1) parsing of programming languages

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
An attribute evaluation of context-free languages

Information Processing Letters
An efficient context-free parsing algorithm

Communications of the ACM
Transition network grammars for natural language analysis

Communications of the ACM
The next 700 programming languages

Communications of the ACM
Packrat parsing:: simple, powerful, lazy, linear time, functional pearl

Proceedings of the seventh ACM SIGPLAN international conference on Functional programming
Rule splitting and attribute-directed parsing

Semantics-Directed Compiler Generation, Proceedings of a Workshop
Attribute-influenced LR parsing

Semantics-Directed Compiler Generation, Proceedings of a Workshop
Disambiguation Filters for Scannerless Generalized LR Parsers

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Attribute-Directed Top-Down Parsing

CC '92 Proceedings of the 4th International Conference on Compiler Construction
Elkhound: A Fast, Practical GLR Parser Generator

Elkhound: A Fast, Practical GLR Parser Generator
Polish parsers, step by step

ICFP '03 Proceedings of the eighth ACM SIGPLAN international conference on Functional programming
Parsing expression grammars: a recognition-based syntactic foundation

Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An extension of earley's algorithm for S-attributed grammars

EACL '91 Proceedings of the fifth conference on European chapter of the Association for Computational Linguistics
Concrete syntax for objects: domain-specific language embedding and assimilation without restrictions

OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
PADS: a domain-specific language for processing ad hoc data

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
The next 700 data description languages

Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Right nulled GLR parsers

ACM Transactions on Programming Languages and Systems (TOPLAS)
PADS/ML: a functional data description language

Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
From dirt to shovels: fully automatic tool generation from ad hoc data

Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages

The PADS project: an overview

Proceedings of the 14th International Conference on Database Theory
Bringing domain-specific languages to digital forensics

Proceedings of the 33rd International Conference on Software Engineering
A new method for dependent parsing

ESOP'11/ETAPS'11 Proceedings of the 20th European conference on Programming languages and systems: part of the joint European conferences on theory and practice of software
Delayed semantic actions in Yakker

Proceedings of the Eleventh Workshop on Language Descriptions, Tools and Applications
Parsing reflective grammars

Proceedings of the Eleventh Workshop on Language Descriptions, Tools and Applications
LL(*): the foundation of the ANTLR parser generator

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Formal network packet processing with minimal fuss: invertible syntax descriptions at work

PLPV '12 Proceedings of the sixth workshop on Programming languages meets program verification
The Semantics of Parsing with Semantic Actions

LICS '12 Proceedings of the 2012 27th Annual IEEE/ACM Symposium on Logic in Computer Science
Adaptable parsing expression grammars

SBLP'12 Proceedings of the 16th Brazilian conference on Programming Languages

Quantified Score

Hi-index	0.01

Visualization

Abstract

We present the design and theory of a new parsing engine, YAKKER, capable of satisfying the many needs of modern programmers and modern data processing applications. In particular, our new parsing engine handles (1) full scannerless context-free grammars with (2) regular expressions as right-hand sides for defining nonterminals. YAKKER also includes (3) facilities for binding variables to intermediate parse results and (4) using such bindings within arbitrary constraints to control parsing. These facilities allow the kind of data-dependent parsing commonly needed in systems applications, particularly those that operate over binary data. In addition, (5) nonterminals may be parameterized by arbitrary values, which gives the system good modularity and abstraction properties in the presence of data-dependent parsing. Finally, (6) legacy parsing libraries,such as sophisticated libraries for dates and times, may be directly incorporated into parser specifications. We illustrate the importance and utility of this rich collection of features by presenting its use on examples ranging from difficult programming language grammars to web server logs to binary data specification. We also show that our grammars have important compositionality properties and explain why such properties areimportant in modern applications such as automatic grammar induction. In terms of technical contributions, we provide a traditional high-level semantics for our new grammar formalization and show how to compile grammars into non deterministic automata. These automata are stack-based, somewhat like conventional push-down automata,but are also equipped with environments to track data-dependent parsing state. We prove the correctness of our translation of data-dependent grammars into these new automata and then show how to implement the automata efficiently using a variation of Earley's parsing algorithm.