Applied multivariate statistical analysis
Applied multivariate statistical analysis
Ranking retrieval systems without relevance judgments
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
On the effectiveness of evaluating retrieval systems in the absence of relevance judgments
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Convex Optimization
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Evaluating evaluation metrics based on the bootstrap
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
On rank correlation in information retrieval evaluation
ACM SIGIR Forum
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Power and bias of subset pooling strategies
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of statistical significance tests for information retrieval evaluation
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A new rank correlation coefficient for information retrieval
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Generalized distances between rankings
Proceedings of the 19th international conference on World wide web
Visualizing differences in web search algorithms using the expected weighted hoeffding distance
Proceedings of the 19th international conference on World wide web
Reusable test collections through experimental design
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A similarity measure for indefinite rankings
ACM Transactions on Information Systems (TOIS)
Web search solved?: all result rankings the same?
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Boiling down information retrieval test collections
RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Evaluation effort, reliability and reusability in XML retrieval
Journal of the American Society for Information Science and Technology
Displacement based unsupervised metric for evaluating rank aggregation
PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Fidelity metrics for estimation models
Proceedings of the International Conference on Computer-Aided Design
A social node model for realising information dissemination strategies in delay tolerant networks
Proceedings of the 15th ACM international conference on Modeling, analysis and simulation of wireless and mobile systems
Comparison of chemical similarity measures using different numbers of query structures
Journal of Information Science
Editor's Choice Article: Motion-based segmentation of objects using overlapping temporal windows
Image and Vision Computing
Hi-index | 0.00 |
Rank correlation statistics are useful for determining whether a there is a correspondence between two measurements, particularly when the measures themselves are of less interest than their relative ordering. Kendall's - in particular has found use in Information Retrieval as a "meta-evaluation" measure: it has been used to compare evaluation measures, evaluate system rankings, and evaluate predicted performance. In the meta-evaluation domain, however, correlations between systems confound relationships between measurements, practically guaranteeing a positive and significant estimate of - regardless of any actual correlation between the measurements. We introduce an alternative measure of distance between rankings that corrects this by explicitly accounting for correlations between systems over a sample of topics, and moreover has a probabilistic interpretation for use in a test of statistical significance. We validate our measure with theory, simulated data, and experiment.