Bounding the maximal parsing performance of non-terminally separated grammars

Authors:
Franco M. Luque;Gabriel Infante-Lopez
Affiliations:
Universidad Nacional de Córdoba & Conicet, Córdoba, Argentina;Universidad Nacional de Córdoba & Conicet, Córdoba, Argentina
Venue:
ICGI'10 Proceedings of the 10th international colloquium conference on Grammatical inference: theoretical results and applications
Year:
2010

Citing 11
Cited 1

Procedure for quantitatively comparing the syntactic coverage of English grammars

HLT '91 Proceedings of the workshop on Speech and Natural Language
The EMILE 4.1 Grammar Induction Toolbox

ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
ABL: alignment-based learning

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Learning deterministic context free grammars: The Omphalos competition

Machine Learning
Corpus-based induction of syntactic structure: models of dependency and constituency

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Problems with Evaluation of Unsupervised Empirical Grammatical Inference Systems

ICGI '08 Proceedings of the 9th international colloquium on Grammatical Inference: Algorithms and Applications
Unsupervised parsing with U-DOP

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Upper bounds for unsupervised parsing with unambiguous non-terminally separated grammars

CLAGI '09 Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference
PAC-learning unambiguous k, l-NTS≤languages

ICGI'10 Proceedings of the 10th international colloquium conference on Grammatical inference: theoretical results and applications
PAC-learning unambiguous NTS languages

ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications

PAC-learning unambiguous k, l-NTS≤languages

ICGI'10 Proceedings of the 10th international colloquium conference on Grammatical inference: theoretical results and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Unambiguous Non-Terminally Separated (UNTS) grammars have good learnability properties but are too restrictive to be used for natural language parsing. We present a generalization of UNTS grammars called Unambiguous Weakly NTS (UWNTS) grammars that preserve the learnability properties. Then, we study the problem of using them to parse natural language and evaluating against a gold treebank. If the target language is not UWNTS, there will be an upper bound in the parsing performance. In this paper we develop methods to find upper bounds for the unlabeled F1 performance that any UWNTS grammar can achieve over a given treebank. We define a new metric, show that its optimization is NP-Hard but solvable with specialized software, and show a translation of the result to a bound for the F1. We do experiments with the WSJ10 corpus, finding an F1 bound of 76.1% for the UWNTS grammars over the POS tags alphabet.