Definite clause grammars for language analysis
Readings in natural language processing
The syntax definition formalism SDF—reference manual—
ACM SIGPLAN Notices
What can we do about the unnecessary diversity of notation for syntactic definitions?
Communications of the ACM
Semi-automatic grammar recovery
Software—Practice & Experience
Cracking the 500-Language Problem
IEEE Software
Development, Assessment, and Reengineering of Language Descriptions
CSMR '00 Proceedings of the Conference on Software Maintenance and Reengineering
SCAM '02 Proceedings of the Second IEEE International Workshop on Source Code Analysis and Manipulation
The Definitive ANTLR Reference: Building Domain-Specific Languages
The Definitive ANTLR Reference: Building Domain-Specific Languages
An Introduction to Grammar Convergence
IFM '09 Proceedings of the 7th International Conference on Integrated Formal Methods
EASY meta-programming with Rascal
GTTSE'09 Proceedings of the 3rd international summer school conference on Generative and transformational techniques in software engineering III
Recovering grammar relationships for the Java Language Specification
Software Quality Control
Obtaining a COBOL grammar from legacy code for reengineering purposes
Algebraic'97 Proceedings of the 2nd international conference on Theory and Practice of Algebraic Specifications
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Hi-index | 0.00 |
Automation of grammar recovery is an important research area that received attention over the last decade and a half. Given the abundance of available documentation for software languages that is only going to keep increasing in the future, there is need for reliable extraction techniques that allow grammar engineers to derive useful information from it. This information can be further used to build grammarware, like parsers or test generators, or to perform grammar investigation. Grammars obtained systematically from existing sources always have preference over manually constructed ones due to traceability of their issues, including errors and design weaknesses. This paper focuses on automated grammar recovery from sources that utilise a family of metasyntaxes known as EBNF: many language specifications extend the well-studied Backus Naur Form in different directions, resulting in unnecessary diversity of syntactic notations. To enable manipulation of EBNF families, we use EDD, the EBNF Dialect Definition, a recently published DSL for notation specification, and base our approach on human-specified indications that guide the subsequent automated heuristic-based recovery process. Two separate scenarios are considered in the paper: a reliable syntactic notation and an unreliable one, with the latter being remarkably more difficult to handle, but also substantially more useful since it is so often encountered in practice. The proposed approach has been verified by two prototypes that were capable of extracting dozens of grammars written in 42 different syntactic notations.