The need for accurate alignment in natural language system evaluation

Authors:
Andrew Kehler;John Bear;Douglas Appelt
Affiliations:
University of California, San Diego, La Jolla, CA;Artificial Intelligence Center, Menlo Park, CA;Artificial Intelligence Center, Menlo Park, CA
Venue:
Computational Linguistics
Year:
2001

Citing 6
Cited 0

Information Retrieval

Information Retrieval
Evaluating Natural Language Processing Systems: An Analysis and Review

Evaluating Natural Language Processing Systems: An Analysis and Review
Design of the MUC-6 evaluation

MUC6 '95 Proceedings of the 6th conference on Message understanding
Four scorers and seven years ago: the scoring method for MUC-6

MUC6 '95 Proceedings of the 6th conference on Message understanding
MITRE: description of the Alembic system used for MUC-6

MUC6 '95 Proceedings of the 6th conference on Message understanding
SRI International FASTUS system: MUC-6 test results and analysis

MUC6 '95 Proceedings of the 6th conference on Message understanding

Quantified Score

Hi-index	0.00

Visualization

Abstract

As evaluations of computational linguistics technology progress toward higher-level interpretation tasks, the problem of determining alignments between system responses and answer key entries may become less straightforward. We present an extensive analysis of the alignment procedure used in the MUC-6 evaluation of information extraction technology, which reveals effects that interfere with the stated goals of the evaluation. These effects are shown to be pervasive enough that they have the potential to adversely impact the technology development process. These results argue strongly/ or the use of accurate alignment criteria in natural language evaluations, and/ or maintaining the independence of alignment criteria and mechanisms used to calculate scores.