Upper bounds for unsupervised parsing with unambiguous non-terminally separated grammars

Authors:
Franco M. Luque;Gabriel Infante-Lopez
Affiliations:
Universidad Nacional de Córdoba & CONICET, Argentina;Universidad Nacional de Córdoba & CONICET, Argentina
Venue:
CLAGI '09 Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference
Year:
2009

Citing 6
Cited 2

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Learning deterministic context free grammars: The Omphalos competition

Machine Learning
Corpus-based induction of syntactic structure: models of dependency and constituency

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
An all-subtrees approach to unsupervised parsing

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Unsupervised parsing with U-DOP

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
PAC-learning unambiguous NTS languages

ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications

Data-driven computational linguistics at FaMAF-UNC, Argentina

YIWCALA '10 Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas
Bounding the maximal parsing performance of non-terminally separated grammars

ICGI'10 Proceedings of the 10th international colloquium conference on Grammatical inference: theoretical results and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Unambiguous Non-Terminally Separated (UNTS) grammars have properties that make them attractive for grammatical inference. However, these properties do not state the maximal performance they can achieve when they are evaluated against a gold treebank that is not produced by an UNTS grammar. In this paper we investigate such an upper bound. We develop a method to find an upper bound for the unlabeled F1 performance that any UNTS grammar can achieve over a given tree-bank. Our strategy is to characterize all possible versions of the gold treebank that UNTS grammars can produce and to find the one that optimizes a metric we define. We show a way to translate this score into an upper bound for the F1. In particular, we show that the F1 parsing score of any UNTS grammar can not be beyond 82.2% when the gold treebank is the WSJ10 corpus.