Automated extraction of Tree-Adjoining Grammars from treebanks

Authors:
John Chen;Srinivas Bangalore;K. Vijay-Shanker
Affiliations:
Microsoft Research Asia, No. 49 Zhichun Road, Haidian District, Beijing 100080, China e-mail: t-Johnc@microsoft.com;AT&T Labs––Research, P.O. Box 971, 180 Park Avenue, Florham Park, NJ 07932, USA e-mail: srini@research.att.com;Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA e-mail: vijay@cis.udel.edu
Venue:
Natural Language Engineering
Year:
2006

Citing 31
Cited 7

Natural language parsing as statistical pattern recognition

Natural language parsing as statistical pattern recognition
Similarity-Based Models of Word Cooccurrence Probabilities

Machine Learning - Special issue on natural language learning
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
Towards efficient statistical parsing using lexicalized grammatical information

Towards efficient statistical parsing using lexicalized grammatical information
Coping with ambiguity and unknown words through probabilistic models

Computational Linguistics - Special issue on using large corpora: II
An alternative conception of tree-adjoining derivation

Computational Linguistics
Supertagging: an approach to almost parsing

Computational Linguistics
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Automatic extraction of subcategorization from corpora

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
New models for improving supertag disambiguation

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
An empirical evaluation of Probabilistic Lexicalized Tree Insertion Grammars

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
D-tree grammars

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Parsing strategies with 'lexicalized' grammars: application to tree adjoining grammars

COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 2
Probabilistic tree-adjoining grammar as a framework for statistical natural language processing

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Stochastic lexicalized tree-adjoining grammars

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Coordination in Tree Adjoining Grammars: formalization and implementation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Measures of distributional similarity

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Supervised grammar induction using training data with limited constituent information

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Recovering latent information in treebanks

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Towards automatic generation of natural language generation systems

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Generative models for statistical parsing with Combinatory Categorial Grammar

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Applying co-training methods to statistical parsing

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Statistical parsing with an automatically-extracted tree adjoining grammar

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
The Penn Treebank: annotating predicate argument structure

HLT '94 Proceedings of the workshop on Human Language Technology
Comparing Lexicalized Treebank Grammars extracted from Chinese, Korean, and English corpora

CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
A uniform method of grammar extraction and its applications

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Automatic distinction of arguments and modifiers: the case of prepositional phrases

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Use of deep linguistic features for the recognition and labeling of semantic arguments

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Identifying semantic roles using Combinatory Categorial Grammar

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing

Creating a CCGbank and a wide-coverage CCG lexicon for German

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank

Computational Linguistics
Grammar Induction by Unification of Type-logical Lexicons

Journal of Logic, Language and Information
Chinese CCGbank: extracting CCG derivations from the Penn Chinese Treebank

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Insertion operator for Bayesian tree substitution grammars

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Distributional learning of simple context-free tree grammars

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Integrating source-language context into phrase-based statistical machine translation

Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has been a contemporary surge of interest in the application of stochastic models of parsing. The use of tree-adjoining grammar (TAG) in this domain has been relatively limited due in part to the unavailability, until recently, of large-scale corpora hand-annotated with TAG structures. Our goals are to develop inexpensive means of generating such corpora and to demonstrate their applicability to stochastic modeling. We present a method for automatically extracting a linguistically plausible TAG from the Penn Treebank. Furthermore, we also introduce labor-inexpensive methods for inducing higher-level organization of TAGs. Empirically, we perform an evaluation of various automatically extracted TAGs and also demonstrate how our induced higher-level organization of TAGs can be used for smoothing stochastic TAG models.