The present use of statistics in the evaluation of NLP parsers

Authors:
J. Entwisle;D. M. W. Powers
Affiliations:
Flinders University of South Australia;Flinders University of South Australia
Venue:
NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
Year:
1998

Citing 5
Cited 4

Ambiguity resolution in a reductionistic parser

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
A syntax-based part-of-speech analyser

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Constraint grammar as a framework for parsing running text

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
Syntactic analysis of natural language using linguistic rules and corpus-based patterns

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Applications and explanations of Zipf's law

NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning

Reconciliation of unsupervised clustering, segmentation and cohesion

NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
The problem with kappa

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Minors as miners: modelling and evaluating ontological and linguistic learning

AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
A computationally and cognitively plausible model of supervised and unsupervised learning

BICS'13 Proceedings of the 6th international conference on Advances in Brain Inspired Cognitive Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We are concerned that the quality of results produced by an NLP parser bears little, if any, relation to the percentage-results claimed by the various NLP parser-systems presently available for use. To illustrate this problem, we examine one readily available NLP tagging and parsing system, the ENGCG parser; and one tagger, the Brill tagger. We note responses to both artificially generated and naturally occurring text. The percentage assessments are methodologically flawed, and should be taken with a grain of salt; instead, assessment of the performance of an NLP parser should be effected by a user, and solely from a consideration of the resulting parses of exactly the input which an NLP user decides to contribute for such an assessment. Careful attention to input of whatever corpus the user decides on, is presently the only suitable qualifying test of parsing ability. The parsers available are none of them perfectible yet, despite apparent yields now quoted at 99%+. We consider the impact of Zipf's argument of 'least effort' on percentage assessment; and we open a discussion on estimating the relative complexities of corpora.