Multibox parsers

  • Authors:
  • Lev J. Dyadkin

  • Affiliations:
  • -

  • Venue:
  • ACM SIGSOFT Software Engineering Notes
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional compiler front end generating tools such as Lex/Yacc assume a front end consisting of two boxes: a lexical box and a syntax box. Lex produces a lexical analyzer using regular expressions as a token description. Yacc generates a syntax analyzer from the LALR grammar for the parsed language. This approach has big problems with such lexically and syntactically complex languages as Fortran. The main reason for these problems is that regular expressions, being equivalent to a right linear grammar, do not have the capability to describe the incredibly complex lexical structure of Fortran. As a result, compiler writers abandon Lex and produce handwritten lexers for Fortran, thus defeating the main purpose of the parser generator, automation.This work solves these problems by introducing a multibox parser, where each lower box modifies its input language to produce a more "straightened" output language for the higher box. The number of boxes reflects the complexity of the parsed language.For example, Fortran requires more boxes than does C. Each box is represented by an L-attributed translation grammar in simple assignment form with an LL(1) input grammar. LL(1) grammars were chosen for higher speed, smaller size, and because, unlike regular expressions, they can express constructs such as nested parentheses, a capability which is required for parsing Fortran on the lexical level. New operations for the LL(1) machine are added to ensure it is strictly forward moving, without backtracking in the parsed source code. We have extended the LL(1) grammars to "indexed LL(1) grammars."This enhancement allows more of the resulting code to be automatically generated, rather than handwritten. New parser generating tools have been developed by us to support this technology. The multibox approach has been implemented in the Lahey Fortran 90 compiler.