Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
PCFG models of linguistic tree representations
Computational Linguistics
Automatic extraction of subcategorization from corpora
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Three generative, lexicalised models for statistical parsing
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Automatic acquisition of subcategorization frames from untagged text
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Automatic acquisition of a large subcategorization dictionary from corpora
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Inside-outside estimation of a lexicalized PCFG for German
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Accurate unlexicalized parsing
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Inside-outside reestimation from partially bracketed corpora
HLT '91 Proceedings of the workshop on Speech and Natural Language
Intricacies of Collins' Parsing Model
Computational Linguistics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Trace prediction and recovery with unlexicalized PCFGs and slash features
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Efficient parsing of highly ambiguous context-free grammars with bit vectors
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Effective self-training for parsing
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Smoothing fine-grained PCFG lexicons
IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Learning structural dependencies of words in the Zipfian tail
IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
French parsing enhanced with a word clustering method based on a syntactic lexicon
SPMRL '11 Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages
Subcat-LMF: fleshing out a standardized format for subcategorization frame interoperability
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Hi-index | 0.00 |
We present procedures which pool lexical information estimated from unlabeled data via the Inside-Outside algorithm, with lexical information from a treebank PCFG. The procedures produce substantial improvements (up to 31.6% error reduction) on the task of determining subcategorization frames of novel verbs, relative to a smoothed Penn Treebank-trained PCFG. Even with relatively small quantities of unlabeled training data, the re-estimated models show promising improvements in labeled bracketing f-scores on Wall Street Journal parsing, and substantial benefit in acquiring the subcategorization preferences of low-frequency verbs.