Significance tests for the evaluation of ranking methods

Authors:
Stefan Evert
Affiliations:
Universität Stuttgart, Stuttgart, Germany
Venue:
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Year:
2004

Citing 5
Cited 3

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
More accurate tests for the statistical significance of result differences

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Methods for the qualitative evaluation of lexical association measures

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Text analysis meets computational lexicography

COLING '04 Proceedings of the 20th international conference on Computational Linguistics

Computing word similarity and identifying cognates with pair hidden Markov models

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Two-Word Collocation Extraction Using Monolingual Word Alignment Method

ACM Transactions on Intelligent Systems and Technology (TIST)
A probabilistic interpretation of precision, recall and F-score, with implication for evaluation

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a statistical model that interprets the evaluation of ranking methods as a random experiment. This model predicts the variability of evaluation results, so that appropriate significance tests for the results can be derived. The paper concludes with an empirical validation of the model on a collocation extraction task.