Binary and graded relevance in IR evaluations: comparison of the effects on ranking of IR systems

Authors:
Jaana Kekäläinen
Affiliations:
Department of Information Studies, FIN-33014 University of Tampere, Finland
Venue:
Information Processing and Management: an International Journal
Year:
2005

Citing 17
Cited 7

A re-examination of relevance: toward a dynamic, situational definition

Information Processing and Management: an International Journal
Variations in relevance judgments and the evaluation of retrieval performance

Information Processing and Management: an International Journal
Presenting results of experimental retrieval comparisons

Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Using statistical testing in the evaluation of retrieval experiments

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
User-defined relevance criteria: an exploratory study

Journal of the American Society for Information Science - Special issue: relevance research
Information storage and retrieval

Information storage and retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Measures of relative relevance and ranked half-life: performance indicators for interactive IR

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Towards the identification of the optimal number of relevance categories

Journal of the American Society for Information Science
Evaluating evaluation measure stability

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Dimensions of relevance

Information Processing and Management: an International Journal
Variations in relevance judgments and the measurement of retrieval effectiveness

Information Processing and Management: an International Journal
Evaluation by highly relevant documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Liberal relevance criteria of TREC -: counting on negligible documents?

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Average gain ratio: a simple retrieval performance measure for evaluation with multiple relevance levels

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

Evaluating evaluation metrics based on the bootstrap

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval

ACM Transactions on Information Systems (TOIS)
On the reliability of information retrieval metrics based on graded relevance

Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
On information retrieval metrics designed for evaluation with incomplete relevance assessments

Information Retrieval
Rank-biased precision for measurement of retrieval effectiveness

ACM Transactions on Information Systems (TOIS)
Empirical justification of the gain and discount function for nDCG

Proceedings of the 18th ACM conference on Information and knowledge management
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this study the rankings of IR systems based on binary and graded relevance in TREC 7 and 8 data are compared. Relevance of a sample TREC results is reassessed using a relevance scale with four levels: non-relevant, marginally relevant, fairly relevant, highly relevant. Twenty-one topics and 90 systems from TREC 7 and 20 topics and 121 systems from TREC 8 form the data. Binary precision, and cumulated gain, discounted cumulated gain and normalised discounted cumulated gain are the measures compared. Different weighting schemes for relevance levels are tested with cumulated gain measures. Kendall's rank correlations are computed to determine to what extent the rankings produced by different measures are similar. Weighting schemes from binary to emphasising highly relevant documents form a continuum, where the measures correlate strongly in the binary end, and less in the heavily weighted end. The results show the different character of the measures.