Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Corpus-based induction of syntactic structure: models of dependency and constituency
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
An all-subtrees approach to unsupervised parsing
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Unsupervised parsing with U-DOP
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
PAC-learning unambiguous NTS languages
ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Data-driven computational linguistics at FaMAF-UNC, Argentina
YIWCALA '10 Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas
Bounding the maximal parsing performance of non-terminally separated grammars
ICGI'10 Proceedings of the 10th international colloquium conference on Grammatical inference: theoretical results and applications
Hi-index | 0.00 |
Unambiguous Non-Terminally Separated (UNTS) grammars have properties that make them attractive for grammatical inference. However, these properties do not state the maximal performance they can achieve when they are evaluated against a gold treebank that is not produced by an UNTS grammar. In this paper we investigate such an upper bound. We develop a method to find an upper bound for the unlabeled F1 performance that any UNTS grammar can achieve over a given tree-bank. Our strategy is to characterize all possible versions of the gold treebank that UNTS grammars can produce and to find the one that optimizes a metric we define. We show a way to translate this score into an upper bound for the F1. In particular, we show that the F1 parsing score of any UNTS grammar can not be beyond 82.2% when the gold treebank is the WSJ10 corpus.