An efficient augmented-context-free parsing algorithm

  • Authors:
  • Masaru Tomita

  • Affiliations:
  • Carnegie-Mellon University, Pittsburgh, PA

  • Venue:
  • Computational Linguistics
  • Year:
  • 1987

Quantified Score

Hi-index 0.00

Visualization

Abstract

An efficient parsing algorithm for augmented context-free grammars is introduced, and its application to on-line natural language interfaces discussed. The algorithm is a generalized LR parsing algorithm, which precomputes an LR shift-reduce parsing table (possibly with multiple entries) from a given augmented context-free grammar. Unlike the standard LR parsing algorithm, it can handle arbitrary context-free grammars, including ambiguous grammars, while most of the LR efficiency is preserved by introducing the concept of a "graph-structured stack". The graph-structured stack allows an LR shift-reduce parser to maintain multiple parses without parsing any part of the input twice in the same way. We can also view our parsing algorithm as an extended chart parsing algorithm efficiently guided by LR parsing tables. The algorithm is fast, due to the LR table precomputation. In several experiments with different English grammars and sentences, timings indicate a five- to tenfold speed advantage over Earley's context-free parsing algorithm.The algorithm parses a sentence strictly from left to right on-line, that is, it starts parsing as soon as the user types in the first word of a sentence, without waiting for completion of the sentence. A practical on-line parser based on the algorithm has been implemented in Common Lisp, and running on Symbolics and HP AI workstations. The parser is used in the multi-lingual machine translation project at CMU. Also, a commercial on-line parser for Japanese language is being built by Intelligent Technology Incorporation, based on the technique developed at CMU.