The statistical significance of the MUC-4 results

  • Authors:
  • Nancy Chinchor

  • Affiliations:
  • Science Applications International Corporation, San Diego, CA

  • Venue:
  • MUC4 '92 Proceedings of the 4th conference on Message understanding
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

The MUC-4 scores of recall, precision, and the F-measures are used to measure the performance of the participating systems. The differences in the scores between any two systems may be due to chance or may be due to a significant difference between the two systems. To rule out the possibility that the difference is due to chance, statistical hypothesis testing is used. The method of hypothesis testing used is a computationally-intensive method known as approximate randomization. The method and the statistical significance of the results for the two MUC-4 test sets, TST3 and TST4, will be discussed in this paper.