Synchronous and multicomponent tree-adjoining grammars: complexity, algorithms and linguistic applications

  • Authors:
  • Stuart M. Shieber;Rebecca Nancy Nesson

  • Affiliations:
  • Harvard University;Harvard University

  • Venue:
  • Synchronous and multicomponent tree-adjoining grammars: complexity, algorithms and linguistic applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This thesis addresses the design of appropriate formalisms and algorithms to be used for natural language processing. This entails a delicate balance between the ability of a formalism to capture the linguistic generalizations required by natural language processing applications and the ability of a natural language processing application based on the formalism to process the formalism efficiently enough to be useful. I focus on the Tree-Adjoining Grammar formalism as a base and on the mechanism of grammar synchronization for managing relationships between the input and output of a natural language processing system. Grammar synchronization is a formal concept by which the derivations of two distinct grammars occur in tandem so that a single isomorphic derivation produces distinct derived structures in each of the synchronized grammars. Using synchronization implies a strong assumption—one that I seek to justify in the second part of the thesis—namely that certain critical relationships in natural language applications, such as the relationship between the syntax and semantics of a language or the relationship between the syntax of two natural languages, are close enough to be expressed with grammars that share a derivational structure. The extent of the isomorphism between the derived structures of the related languages is determined only in part by the synchronization. The base formalism chosen can offer greater or lesser opportunity for divergence in the derived structures. My choice of a base formalism is motivated directly by research into applications of synchronous TAG-based grammars to two natural language applications: semantic interpretation and natural language translations. I first examine a range of TAG variants that have not previously been studied in this level of detail to determine their computational properties and to develop algorithms that can be used to process them. Original results on the complexity of these formalisms are presented as well as novel algorithms for factorizing grammars to reduce the time required to process them. In Part II, I develop applications of synchronous Limited Delay Tree-Local Multicomponent TAG to semantic interpretation and probabilistic synchronous Tree Insertion Grammar to statistical natural language translation.