Automated extraction of Tree-Adjoining Grammars from treebanks

  • Authors:
  • John Chen;Srinivas Bangalore;K. Vijay-Shanker

  • Affiliations:
  • Microsoft Research Asia, No. 49 Zhichun Road, Haidian District, Beijing 100080, China e-mail: t-Johnc@microsoft.com;AT&T Labs––Research, P.O. Box 971, 180 Park Avenue, Florham Park, NJ 07932, USA e-mail: srini@research.att.com;Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA e-mail: vijay@cis.udel.edu

  • Venue:
  • Natural Language Engineering
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

There has been a contemporary surge of interest in the application of stochastic models of parsing. The use of tree-adjoining grammar (TAG) in this domain has been relatively limited due in part to the unavailability, until recently, of large-scale corpora hand-annotated with TAG structures. Our goals are to develop inexpensive means of generating such corpora and to demonstrate their applicability to stochastic modeling. We present a method for automatically extracting a linguistically plausible TAG from the Penn Treebank. Furthermore, we also introduce labor-inexpensive methods for inducing higher-level organization of TAGs. Empirically, we perform an evaluation of various automatically extracted TAGs and also demonstrate how our induced higher-level organization of TAGs can be used for smoothing stochastic TAG models.