Large-scale induction and evaluation of lexical resources from the Penn-II treebank

Authors:
Ruth O'Donovan;Michael Burke;Aoife Cahill;Josef van Genabith;Andy Way
Affiliations:
Dublin City University, Dublin, Ireland;Dublin City University, Dublin, Ireland;Dublin City University, Dublin, Ireland;Dublin City University, Dublin, Ireland;Dublin City University, Dublin, Ireland
Venue:
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Year:
2004

Citing 10
Cited 6

Natural language parsing as statistical pattern recognition

Natural language parsing as statistical pattern recognition
From grammar to lexicon: unsupervised learning of lexical syntax

Computational Linguistics - Special issue on using large corpora: II
Automatic extraction of subcategorization from corpora

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Compacting the Penn Treebank grammar

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Automatic acquisition of a large subcategorization dictionary from corpora

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
The Comlex Syntax project: the first year

HLT '94 Proceedings of the workshop on Human Language Technology
A uniform method of grammar extraction and its applications

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Long-distance dependency resolution in automatically acquired wide-coverage PCFG-based LFG approximations

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Tree-bank grammars

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2

Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II and Penn-III Treebanks

Computational Linguistics
Long-distance dependency resolution in automatically acquired wide-coverage PCFG-based LFG approximations

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Wide-coverage deep statistical parsing using automatic dependency structure annotation

Computational Linguistics
Automatic treebank-based acquisition of Arabic LFG dependency structures

Semitic '09 Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages
Parser evaluation over local and non-local deep dependencies in a large corpus

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
GeneTUC, GENIA and google: natural language understanding in molecular biology literature

Transactions on Computational Systems Biology V

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a methodology for extracting subcategorisation frames based on an automatic LFG f-structure annotation algorithm for the Penn-II Treebank. We extract abstract syntactic function-based subcategorisation frames (LFG semantic forms), traditional CFG category-based subcategorisation frames as well as mixed function/category-based frames, with or without preposition information for obliques and particle information for particle verbs. Our approach does not predefine frames, associates probabilities with frames conditional on the lemma, distinguishes between active and passive frames, and fully reflects the effects of long-distance dependencies in the source data structures. We extract 3586 verb lemmas, 14348 semantic form types (an average of 4 per lemma) with 577 frame types. We present a large-scale evaluation of the complete set of forms extracted against the full COMLEX resource.