Multibox Parsers: No More Handwritten Lexical Analyzers

  • Authors:
  • Lev J. Dyadkin

  • Affiliations:
  • -

  • Venue:
  • IEEE Software
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

Tools are available to generate the parser part of the compiler front end from the grammar describing the language being parsed. Tools like Lex/Yacc assume that the parser has two parts, or "boxes": the lexical analyzer and the syntax analyzer. This approach poses significant problems for lexically complex languages like Fortran because one box for the entire lexical analysis is not enough to express grammatically the level of complexity. As a result, compiler writers abandon Lex (which does the lexical analysis) and produce handwritten lexical analyzers, thus defeating the main purpose of the parser generator, which is to automate the production of the entire parser. An alternative to the two-box parser, and one that overcomes these complexity problems, is the multibox parser. Instead of having a box for lexical analysis and a box for syntax analysis, the multibox parser has a string of boxes. Each box modifies its input language to produce a more "straightened" output language for the next box. The number of boxes needed depends on the complexity of the language to be parsed. This multibox approach allows the automatic generation of a lexical analyzer regardless of the language to be parsed because it has enough boxes to handle the level of lexical complexity, even in languages as complex as the new Fortran 90 standard.Although the approach has been used in constructing compilers only for Fortran 90, it is suitable for the construction of compilers for other languages as well. In this case, the number and design of the boxes and corresponding grammars must provide for the new language.