The representation of constituent structures for finite-state parsing

  • Authors:
  • D. Terence Langendoen;Yedidyah Langsam

  • Affiliations:
  • Brooklyn College of the City University of New York, Brooklyn, New York;Brooklyn College of the City University of New York, Brooklyn, New York

  • Venue:
  • ACL '84 Proceedings of the 10th International Conference on Computational Linguistics and 22nd annual meeting on Association for Computational Linguistics
  • Year:
  • 1984

Quantified Score

Hi-index 0.03

Visualization

Abstract

A mixed prefix-postfix notation for representations of the constituent structures of the expressions of natural languages is proposed, which are of limited degree of center embedding if the original expressions are noncenter-embedding. The method of constructing these representations is applicable to expressions with center embedding, and results in representations which seem to reflect the ways in which people actually parse those expressions. Both the representations and their interpretations can be computed from the expressions from left to right by finite-state devices.The class of acceptable expressions of a natural language L all manifest no more than a small, fixed, finite degree n of center embedding. From this observation, it follows that the ability of human beings to parse the expressions of L can be modeled by a finite transducer that associates with the acceptable expressions of L representations of the structural descriptions of those expressions. This paper considers some initial steps in the construction of such a model. The first step is to determine a method of representating the class of constituent structures of the expressions of L without center embedding in such a way that the members of that class themselves have no more than a small fixed finite degree of center embedding. Given a grammar that directly generates that class of constituent structures, it is not difficult to construct a deterministic finite-state transducer (parser) that assigns the appropriate members of that class to the noncenter-embedded expressions of L from left to right. The second step is to extend the method so that it is capable of representing the class of constituent structures of expressions of L with no more than degree n of center embedding in a manner which appears to accord with the way in which human beings actually parse those sentences. Given certain reasonable assumptions about the character of the rules of grammar of natural languages, we show how this step can also be taken.