Evaluating two methods for Treebank grammar compaction

Authors:
Alexander Krotov;Mark Hepple;Robert Gaizauskas;Yorick Wilks
Affiliations:
Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK/ alexk@dcs.shef.ac.uk, hepple@dcs.shef.ac.uk, robertg@dcs.shef.ac.uk, yorick@dcs. ...;Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK/ alexk@dcs.shef.ac.uk, hepple@dcs.shef.ac.uk, robertg@dcs.shef.ac.uk, yorick@dcs. ...;Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK/ alexk@dcs.shef.ac.uk, hepple@dcs.shef.ac.uk, robertg@dcs.shef.ac.uk, yorick@dcs. ...;Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK/ alexk@dcs.shef.ac.uk, hepple@dcs.shef.ac.uk, robertg@dcs.shef.ac.uk, yorick@dcs. ...
Venue:
Natural Language Engineering
Year:
1999

Citing 12
Cited 1

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
PCFG models of linguistic tree representations

Computational Linguistics
Restricted representation of phrase structure grammar for building a tree annotated corpus of Korean

Natural Language Engineering
Using an annotated corpus as a stochastic grammar

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Parsing the Wall Street Journal with the inside-outside algorithm

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
A DOP model for semantic interpretation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Compacting the Penn Treebank grammar

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Statistical decision-tree models for parsing

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
A new statistical parser based on bigram lexical dependencies

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
A computational model of language performance: Data Oriented Parsing

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
Tree-bank grammars

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Statistical parsing with a context-free grammar and word statistics

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

Learning grammars for different parsing tasks by partition search

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Treebanks, such as the Penn Treebank, provide a basis for the automatic creation of broad coverage grammars. In the simplest case, rules can simply be ‘read off’ the parse-annotations of the corpus, producing either a simple or probabilistic context-free grammar. Such grammars, however, can be very large, presenting problems for the subsequent computational costs of parsing under the grammar. In this paper, we explore ways by which a treebank grammar can be reduced in size or ‘compacted’, which involve the use of two kinds of technique: (i) thresholding of rules by their number of occurrences; and (ii) a method of rule-parsing, which has both probabilistic and non-probabilistic variants. Our results show that by a combined use of these two techniques, a probabilistic context-free grammar can be reduced in size by 62% without any loss in parsing performance, and by 71% to give a gain in recall, but some loss in precision.