Efficient and flexible incremental parsing

  • Authors:
  • Tim A. Wagner;Susan L. Graham

  • Affiliations:
  • University of Califonia, Berkeley;University of Califonia, Berkeley

  • Venue:
  • ACM Transactions on Programming Languages and Systems (TOPLAS)
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

Previously published algorithms for LR (k) incremental parsing are inefficient, unnecessarily restrictive, and in some cases incorrect. We present a simple algorithm based on parsing LR(k) sentential forms that can incrementally parse an arbitrary number of textual and/or structural modifications in optimal time and with no storage overhead. The central role of balanced sequences in achieving truly incremental behavior from analysis algorithms is described, along with automated methods to support balancing during parse table generation and parsing. Our approach extends the theory of sentential-form parsing to allow for ambiguity in the grammar, exploiting it for notational convenience, to denote sequences, and to construct compact (“abstract”) syntax trees directly. Combined, these techniques make the use of automatically generated incremental parsers in interactive software development environments both practical and effective. In addition, we address information preservation in these environments: Optimal node reuse is defined; previous definitions are shown to be insufficient; and a method for detecting node reuse is provided that is both simpler and faster than existing techniques. A program representation based on self-versioning documents is used to detect changes in the program, generate efficient change reports for subsequent analyses, and allow the parsing transformation itself to be treated as a reversible modification in the edit log.