An efficient chart-based algorithm for partial-parsing of unrestricted texts

  • Authors:
  • David D. McDonald

  • Affiliations:
  • Arlington MA

  • Venue:
  • ANLC '92 Proceedings of the third conference on Applied natural language processing
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present an efficient algorithm for chart-based phrase structure parsing of natural language that is tailored to the problem of extracting specific information from unrestricted texts where many of the words are unknown and much of the text is irrelevant to the task. The parser gains algorithmic efficiency through a reduction of its search space. As each new edge is added to the chart, the algorithm checks only the topmost of the edges adjacent to it, rather than all such edges as in conventional treatments. The resulting spanning edges are insured to be the correct ones by carefully controlling the order in which edges are introduced so that every final constituent covers the longest possible span. This is facilitated through the use of phrase boundary heuristics based on the placement of function words, and by heuristic rules that permit certain kinds of phrases to be deduced despite the presence of unknown words. A further reduction in the search space is achieved by using semantic rather than syntactic categories on the terminal and nonterminal edges, thereby reducing the amount of ambiguity and thus the number of edges, since only edges with a valid semantic interpretation are ever introduced.