A re-examination of relevance: toward a dynamic, situational definition
Information Processing and Management: an International Journal
Variations in relevance judgments and the evaluation of retrieval performance
Information Processing and Management: an International Journal
Presenting results of experimental retrieval comparisons
Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Using statistical testing in the evaluation of retrieval experiments
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
User-defined relevance criteria: an exploratory study
Journal of the American Society for Information Science - Special issue: relevance research
Information storage and retrieval
Information storage and retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Measures of relative relevance and ranked half-life: performance indicators for interactive IR
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Towards the identification of the optimal number of relevance categories
Journal of the American Society for Information Science
Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
IR evaluation methods for retrieving highly relevant documents
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Information Processing and Management: an International Journal
Variations in relevance judgments and the measurement of retrieval effectiveness
Information Processing and Management: an International Journal
Evaluation by highly relevant documents
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Liberal relevance criteria of TREC -: counting on negligible documents?
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Evaluating evaluation metrics based on the bootstrap
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval
ACM Transactions on Information Systems (TOIS)
On the reliability of information retrieval metrics based on graded relevance
Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
Rank-biased precision for measurement of retrieval effectiveness
ACM Transactions on Information Systems (TOIS)
Empirical justification of the gain and discount function for nDCG
Proceedings of the 18th ACM conference on Information and knowledge management
Expected reciprocal rank for graded relevance
Proceedings of the 18th ACM conference on Information and knowledge management
Hi-index | 0.00 |
In this study the rankings of IR systems based on binary and graded relevance in TREC 7 and 8 data are compared. Relevance of a sample TREC results is reassessed using a relevance scale with four levels: non-relevant, marginally relevant, fairly relevant, highly relevant. Twenty-one topics and 90 systems from TREC 7 and 20 topics and 121 systems from TREC 8 form the data. Binary precision, and cumulated gain, discounted cumulated gain and normalised discounted cumulated gain are the measures compared. Different weighting schemes for relevance levels are tested with cumulated gain measures. Kendall's rank correlations are computed to determine to what extent the rankings produced by different measures are similar. Weighting schemes from binary to emphasising highly relevant documents form a continuum, where the measures correlate strongly in the binary end, and less in the heavily weighted end. The results show the different character of the measures.