Procedure for quantitatively comparing the syntactic coverage of English grammars
HLT '91 Proceedings of the workshop on Speech and Natural Language
The EMILE 4.1 Grammar Induction Toolbox
ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Corpus-based induction of syntactic structure: models of dependency and constituency
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Problems with Evaluation of Unsupervised Empirical Grammatical Inference Systems
ICGI '08 Proceedings of the 9th international colloquium on Grammatical Inference: Algorithms and Applications
Unsupervised parsing with U-DOP
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Upper bounds for unsupervised parsing with unambiguous non-terminally separated grammars
CLAGI '09 Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference
PAC-learning unambiguous k, l-NTS≤languages
ICGI'10 Proceedings of the 10th international colloquium conference on Grammatical inference: theoretical results and applications
PAC-learning unambiguous NTS languages
ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
PAC-learning unambiguous k, l-NTS≤languages
ICGI'10 Proceedings of the 10th international colloquium conference on Grammatical inference: theoretical results and applications
Hi-index | 0.00 |
Unambiguous Non-Terminally Separated (UNTS) grammars have good learnability properties but are too restrictive to be used for natural language parsing. We present a generalization of UNTS grammars called Unambiguous Weakly NTS (UWNTS) grammars that preserve the learnability properties. Then, we study the problem of using them to parse natural language and evaluating against a gold treebank. If the target language is not UWNTS, there will be an upper bound in the parsing performance. In this paper we develop methods to find upper bounds for the unlabeled F1 performance that any UWNTS grammar can achieve over a given treebank. We define a new metric, show that its optimization is NP-Hard but solvable with specialized software, and show a translation of the result to a bound for the F1. We do experiments with the WSJ10 corpus, finding an F1 bound of 76.1% for the UWNTS grammars over the POS tags alphabet.