An information-theoretic measure to evaluate parsing difficulty across treebanks

Authors:
Anna Corazza;Alberto Lavelli;Giorgio Satta
Affiliations:
Università di Napoli “Federico II”, Italy;FBK-irst, Trento, Italy;Università di Padova, Italy
Venue:
ACM Transactions on Speech and Language Processing (TSLP)
Year:
2013

Citing 36
Cited 0

Procedure for quantitatively comparing the syntactic coverage of English grammars

HLT '91 Proceedings of the workshop on Speech and Natural Language
Elements of information theory

Elements of information theory
An estimate of an upper bound for the entropy of English

Computational Linguistics
A maximum entropy approach to natural language processing

Computational Linguistics
Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical methods for speech recognition

Statistical methods for speech recognition
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Statistical Language Learning

Statistical Language Learning
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Estimation of probabilistic context-free grammars

Computational Linguistics
PCFG models of linguistic tree representations

Computational Linguistics
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
An annotation scheme for free word order languages

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Probabilistic tree-adjoining grammar as a framework for statistical natural language processing

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Stochastic lexicalized tree-adjoining grammars

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Three new probabilistic models for dependency parsing: an exploration

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Estimators for stochastic "Unification-Based" grammars

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Immediate-head parsing for language models

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Joint and conditional estimation of tagging and parsing models

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Measures and models for phrase recognition

HLT '93 Proceedings of the workshop on Human Language Technology
Intricacies of Collins' Parsing Model

Computational Linguistics
Sample Selection for Statistical Parsing

Computational Linguistics
Head-Driven Statistical Models for Natural Language Parsing

Computational Linguistics
Two statistical parsing models applied to the Chinese Treebank

CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
Conditional structure versus conditional estimation in NLP models

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Discriminative training of a neural network statistical parser

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Lexicalization in crosslinguistic probabilistic parsing: the case of French

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Learning accurate, compact, and interpretable tree annotation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Solution of an Open Problem on Probabilistic Grammars

IEEE Transactions on Computers
Annotation schemes and their influence on parsing results

COLING ACL '06 Proceedings of the 21st International Conference on computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Is it really that difficult to parse German?

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Predicting success in machine translation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Tree-bank grammars

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Linguistic Structure Prediction

Linguistic Structure Prediction

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the growing interest in statistical parsing, special attention has recently been devoted to the problem of comparing different treebanks to assess which languages or domains are more difficult to parse relative to a given model. A common methodology for comparing parsing difficulty across treebanks is based on the use of the standard labeled precision and recall measures. As an alternative, in this article we propose an information-theoretic measure, called the expected conditional cross-entropy (ECC). One important advantage with respect to standard performance measures is that ECC can be directly expressed as a function of the parameters of the model. We evaluate ECC across several treebanks for English, French, German, and Italian, and show that ECC is an effective measure of parsing difficulty, with an increase in ECC always accompanied by a degradation in parsing accuracy.