LL(*): the foundation of the ANTLR parser generator

  • Authors:
  • Terence Parr;Kathleen Fisher

  • Affiliations:
  • University of San Francisco, San Francisco, CA, USA;Tufts University, Boston, MA, USA

  • Venue:
  • Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Despite the power of Parser Expression Grammars (PEGs) and GLR, parsing is not a solved problem. Adding nondeterminism (parser speculation) to traditional LL and LR parsers can lead to unexpected parse-time behavior and introduces practical issues with error handling, single-step debugging, and side-effecting embedded grammar actions. This paper introduces the LL(*) parsing strategy and an associated grammar analysis algorithm that constructs LL(*) parsing decisions from ANTLR grammars. At parse-time, decisions gracefully throttle up from conventional fixed k=1 lookahead to arbitrary lookahead and, finally, fail over to backtracking depending on the complexity of the parsing decision and the input symbols. LL(*) parsing strength reaches into the context-sensitive languages, in some cases beyond what GLR and PEGs can express. By statically removing as much speculation as possible, LL(*) provides the expressivity of PEGs while retaining LL's good error handling and unrestricted grammar actions. Widespread use of ANTLR (over 70,000 downloads/year) shows that it is effective for a wide variety of applications.