Towards a more careful evaluation of broad coverage parsing systems

Authors:
Wide R. Hogenhout;Yuji Matsumoto
Affiliations:
Nara Institute of Science and Technology, Nara, Japan;Nara Institute of Science and Technology, Nara, Japan
Venue:
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Year:
1996

Citing 7
Cited 1

Procedure for quantitatively comparing the syntactic coverage of English grammars

HLT '91 Proceedings of the workshop on Speech and Natural Language
Using an annotated corpus as a stochastic grammar

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Parsing the Wall Street Journal with the inside-outside algorithm

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Towards history-based grammars: using richer models for probabilistic parsing

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Statistical decision-tree models for parsing

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Inside-outside reestimation from partially bracketed corpora

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
A dependency-based method for evaluating broad-coverage parsers

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Unsupervised evaluation of parser robustness

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Since treebanks have become available to researchers a wide variety of techniques has been used to make broad coverage parsing systems. This makes quantitative evaluation very important, but the current evaluation methods have a number of drawbacks such as arbitrary choices in the treebank and the difficulty in measuring statistical significance. We suggest a more detailed method for testing a parsing system using constituent boundaries, with a number of measures that give more information than current measures, and evaluate the quality of the test. We also show that statistical significance cannot be calculated in a straightforward way, and suggest a calculation method for the case of Bracket Recall.