The statistical significance of the MUC-4 results

Authors:
Nancy Chinchor
Affiliations:
Science Applications International Corporation, San Diego, CA
Venue:
MUC4 '92 Proceedings of the 4th conference on Message understanding
Year:
1992

Citing 1
Cited 13

Evaluating message understanding systems: an analysis of the third message understanding conference (MUC-3)

Computational Linguistics

Evaluating message understanding systems: an analysis of the third message understanding conference (MUC-3)

Computational Linguistics
Tipster/MUC-5: information extraction system evaluation

MUC5 '93 Proceedings of the 5th conference on Message understanding
The statistical significance of the MUC-5 results

MUC5 '93 Proceedings of the 5th conference on Message understanding
Text filtering in MUC-3 and MUC-4

MUC4 '92 Proceedings of the 4th conference on Message understanding
Statistical significance of MUC-6 results

MUC6 '95 Proceedings of the 6th conference on Message understanding
Survey of the Message Understanding Conferences

HLT '93 Proceedings of the workshop on Human Language Technology
MUC/MET evaluation trends

TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
Tipster/MUC-5 information extraction system evaluation

TIPSTER '93 Proceedings of a workshop on held at Fredericksburg, Virginia: September 19-23, 1993
Identification of pleonastic it using the web

Journal of Artificial Intelligence Research
Semantic relations for problem-oriented medical records

Artificial Intelligence in Medicine
Cause identification from aviation safety incident reports via weakly supervised semantic lexicon construction

Journal of Artificial Intelligence Research
Tree kernel-based protein-protein interaction extraction from biomedical literature

Journal of Biomedical Informatics
Cost-sensitive active learning for computer-assisted translation

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

The MUC-4 scores of recall, precision, and the F-measures are used to measure the performance of the participating systems. The differences in the scores between any two systems may be due to chance or may be due to a significant difference between the two systems. To rule out the possibility that the difference is due to chance, statistical hypothesis testing is used. The method of hypothesis testing used is a computationally-intensive method known as approximate randomization. The method and the statistical significance of the results for the two MUC-4 test sets, TST3 and TST4, will be discussed in this paper.