Natural language parsing as statistical pattern recognition
Natural language parsing as statistical pattern recognition
Structural ambiguity and lexical relations
Computational Linguistics - Special issue on using large corpora: I
From grammar to lexicon: unsupervised learning of lexical syntax
Computational Linguistics - Special issue on using large corpora: II
Automatic extraction of subcategorization from corpora
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Three generative, lexicalised models for statistical parsing
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
How verb subcategorization frequencies are affected by corpus choice
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Compacting the Penn Treebank grammar
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
ACL '87 Proceedings of the 25th annual meeting on Association for Computational Linguistics
Automatic acquisition of a large subcategorization dictionary from corpora
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Statistical decision-tree models for parsing
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Comlex Syntax: building a computational lexicon
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Automatic extraction of subcategorization frames for Czech
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Building a large-scale annotated Chinese corpus
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
The Comlex Syntax project: the first year
HLT '94 Proceedings of the workshop on Human Language Technology
The Penn Treebank: annotating predicate argument structure
HLT '94 Proceedings of the workshop on Human Language Technology
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Large-scale induction and evaluation of lexical resources from the Penn-II treebank
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Morphology-syntax interface for Turkish LFG
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Creating a CCGbank and a wide-coverage CCG lexicon for German
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank
Computational Linguistics
YIWCALA '10 Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas
Acquisition of unknown word paradigms for large-scale grammars
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Analysis of the difficulties in Chinese deep parsing
IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
French parsing enhanced with a word clustering method based on a syntactic lexicon
SPMRL '11 Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages
Hi-index | 0.00 |
We present a methodology for extracting subcategorization frames based on an automatic lexical-functional grammar (LFG) f-structure annotation algorithm for the Penn-II and Penn-III Treebanks. We extract syntactic-function-based subcategorization frames (LFG semantic forms) and traditional CFG category-based subcategorization frames as well as mixed function/category-based frames, with or without preposition information for obliques and particle information for particle verbs. Our approach associates probabilities with frames conditional on the lemma, distinguishes between active and passive frames, and fully reflects the effects of long-distance dependencies in the source data structures. In contrast to many other approaches, ours does not predefine the subcategorization frame types extracted, learning them instead from the source data. Including particles and prepositions, we extract 21,005 lemma frame types for 4,362 verb lemmas, with a total of 577 frame types and an average of 4.8 frame types per verb. We present a large-scale evaluation of the complete set of forms extracted against the full COMLEX resource. To our knowledge, this is the largest and most complete evaluation of subcategorization frames acquired automatically for English.