The representation of constituent structures for finite-state parsing

Authors:
D. Terence Langendoen;Yedidyah Langsam
Affiliations:
Brooklyn College of the City University of New York, Brooklyn, New York;Brooklyn College of the City University of New York, Brooklyn, New York
Venue:
ACL '84 Proceedings of the 10th International Conference on Computational Linguistics and 22nd annual meeting on Association for Computational Linguistics
Year:
1984

Citing 0
Cited 7

A finite and real-time processor for natural language

Communications of the ACM
Impact of a restricted natural language interface on ease of learning and productivity

Communications of the ACM
Linguistic computation with state space trajectories

Emergent neural computational architectures based on neuroscience
Linguistic Computation with State Space Trajectories

Emergent Neural Computational Architectures Based on Neuroscience - Towards Neuroscience-Inspired Computing
On the mathematical properties of linguistic theories

Computational Linguistics - Special issue on mathematical properties of grammatical formalisms
A parser that doesn't

EACL '85 Proceedings of the second conference on European chapter of the Association for Computational Linguistics
A new kind of finite-state automaton: register vector grammar

IJCAI'85 Proceedings of the 9th international joint conference on Artificial intelligence - Volume 2

Quantified Score

Hi-index	0.03

Visualization

Abstract

A mixed prefix-postfix notation for representations of the constituent structures of the expressions of natural languages is proposed, which are of limited degree of center embedding if the original expressions are noncenter-embedding. The method of constructing these representations is applicable to expressions with center embedding, and results in representations which seem to reflect the ways in which people actually parse those expressions. Both the representations and their interpretations can be computed from the expressions from left to right by finite-state devices.The class of acceptable expressions of a natural language L all manifest no more than a small, fixed, finite degree n of center embedding. From this observation, it follows that the ability of human beings to parse the expressions of L can be modeled by a finite transducer that associates with the acceptable expressions of L representations of the structural descriptions of those expressions. This paper considers some initial steps in the construction of such a model. The first step is to determine a method of representating the class of constituent structures of the expressions of L without center embedding in such a way that the members of that class themselves have no more than a small fixed finite degree of center embedding. Given a grammar that directly generates that class of constituent structures, it is not difficult to construct a deterministic finite-state transducer (parser) that assigns the appropriate members of that class to the noncenter-embedded expressions of L from left to right. The second step is to extend the method so that it is capable of representing the class of constituent structures of expressions of L with no more than degree n of center embedding in a manner which appears to accord with the way in which human beings actually parse those sentences. Given certain reasonable assumptions about the character of the rules of grammar of natural languages, we show how this step can also be taken.