Parallel non-deterministic bottom-up parsing

  • Authors:
  • Bernard Lang

  • Affiliations:
  • Center for Research in Computing Technology, Harvard University

  • Venue:
  • Proceedings of the international symposium on Extensible languages
  • Year:
  • 1971

Quantified Score

Hi-index 0.00

Visualization

Abstract

The development of translator writing systems and extensible languages has led to a simultaneous development of more efficient and general syntax analyzers, usually for context-free (CF) syntax. Our paper describes a type of parser that can be used with reasonable efficiency for any CF grammar, even one which is ambiguous. Our parser, like most others, is based on the pushdown automaton model, and thus its generality requires it to be non-deterministic (in the automata theoretic sense). An actual implementation of a non-deterministic automaton requires that we explore every computational path that could be followed by the theoretical automaton. This can be done serially [5], following successively every computational path whenever a choice occurs. It leads to the algorithms described in [1], whose time bounds can be exponential functions of the length of the string to be parsed. We can also use a parallel implementation which consists in following “simultaneously” all the possible computational paths whenever a non-deterministic choice occurs. Then we can merge paths that have ceased to be different after a certain point. Those mergings reduce drastically the amount of computation. The best known example is the top-down algorithm described in [2]. The algorithm we describe in the first part of our paper is a basic bottom-up parser, similar to [2] in its organization. Both parsers can be shown to work within time bounds which are at most the cube of the length of the input string, and are often a linear function of it. The space bounds are at most the square of that length. The second part of our paper deals with the optimization of the basic parallel bottom-up algorithm, using the properties of weak precedence relations [3]. The various optimization techniques further reduce the amount of computation required and cause frequent occurrence of a “sparse determinism” phenomenon which allows determination of part of the parse-tree before the non-deterministic analysis is ended. These optimizations also considerably lessen the space requirements. Full details can be found in [6]. The parser is easy to generate for any CF grammar, requiring only the computation of precedence tables. It is slower than most deterministic parsers (on the order of ten times); but all examples we have tried showed it to be more efficient than any other parser having the same generality. We consider that the inefficiency of our parser is reasonable as we intend it as a research tool for the language designer rather than as a part of an industrial compiler. Languages designers, or extensible languages users, are often more preoccupied with semantics than syntax and fix the syntax of their language in a “nice” form only when they have found what kind of semantics it is to represent. So they can easily come up with ambiguous syntax [4]. Our parser parses ambiguous languages and detects ambiguities in programs, thus enabling the programmer to eliminate them.