An information-theoretic measure to evaluate parsing difficulty across treebanks

  • Authors:
  • Anna Corazza;Alberto Lavelli;Giorgio Satta

  • Affiliations:
  • Università di Napoli “Federico II”, Italy;FBK-irst, Trento, Italy;Università di Padova, Italy

  • Venue:
  • ACM Transactions on Speech and Language Processing (TSLP)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the growing interest in statistical parsing, special attention has recently been devoted to the problem of comparing different treebanks to assess which languages or domains are more difficult to parse relative to a given model. A common methodology for comparing parsing difficulty across treebanks is based on the use of the standard labeled precision and recall measures. As an alternative, in this article we propose an information-theoretic measure, called the expected conditional cross-entropy (ECC). One important advantage with respect to standard performance measures is that ECC can be directly expressed as a function of the parameters of the model. We evaluate ECC across several treebanks for English, French, German, and Italian, and show that ECC is an effective measure of parsing difficulty, with an increase in ECC always accompanied by a degradation in parsing accuracy.