Information Retrieval
The statistical significance of the MUC-4 results
MUC4 '92 Proceedings of the 4th conference on Message understanding
An Algorithm that Learns What‘s in a Name
Machine Learning - Special issue on natural language learning
An empirically based system for processing definite descriptions
Computational Linguistics
Machine learning-based named entity recognition via effective integration of various evidences
Natural Language Engineering
Named entity recognition using an HMM-based chunk tagger
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
FeatureEng '05 Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing
Global models of document structure using latent permutations
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Learning document-level semantic properties from free-text annotations
Journal of Artificial Intelligence Research
Content modeling using latent permutations
Journal of Artificial Intelligence Research
Faster parsing by supertagger adaptation
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Maximum metric score training for coreference resolution
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Hi-index | 0.00 |
The results of the MUC-6 evaluation must be analyzed to determine whether close scores significantly distinguish systems or whether the differences in those scores are a matter of chance. In order to do such an analysis, a method of computer intensive hypothesis testing was developed by SAIC for the MUC-3 results and has been used for distinguishing MUC scores since that time. The implementation of this method for the MUC evaluations was first described in [1] and later the concepts behind the statistical model were explained in a more understandable manner in [2]. This paper gives the results of the statistical testing for the three MUC-6 tasks where a single metric could be associated with a system's performance.