A re-examination of relevance: toward a dynamic, situational definition
Information Processing and Management: an International Journal
Variations in relevance judgments and the evaluation of retrieval performance
Information Processing and Management: an International Journal
Presenting results of experimental retrieval comparisons
Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Using statistical testing in the evaluation of retrieval experiments
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
User-defined relevance criteria: an exploratory study
Journal of the American Society for Information Science - Special issue: relevance research
Variations in relevance judgments and the measurement of retrieval effectiveness
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Measures of relative relevance and ranked half-life: performance indicators for interactive IR
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Towards the identification of the optimal number of relevance categories
Journal of the American Society for Information Science
Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
IR evaluation methods for retrieving highly relevant documents
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Information Processing and Management: an International Journal
Variations in relevance judgments and the measurement of retrieval effectiveness
Information Processing and Management: an International Journal
Evaluation by highly relevant documents
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Liberal relevance criteria of TREC -: counting on negligible documents?
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Cost-sensitive supported vector learning to rank imbalanced data set
ICIC'09 Proceedings of the Intelligent computing 5th international conference on Emerging intelligent computing technology and applications
Ordinal regression with sparse Bayesian
ICIC'09 Proceedings of the Intelligent computing 5th international conference on Emerging intelligent computing technology and applications
Discounted cumulated gain based evaluation of multiple-query IR sessions
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Extending average precision to graded relevance judgments
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the third symposium on Information interaction in context
Assessors' search result satisfaction associated with relevance in a scientific domain
Proceedings of the third symposium on Information interaction in context
PinDr0p: using single-ended audio features to determine call provenance
Proceedings of the 17th ACM conference on Computer and communications security
The reliability of metrics based on graded relevance
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Bootstrap-Based comparisons of IR metrics for finding one relevant document
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Evaluating scalability in information retrieval with multigraded relevance
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Diversity-aware evaluation for paraphrase patterns
TIWTE '11 Proceedings of the TextInfer 2011 Workshop on Textual Entailment
Information retrieval evaluation with partial relevance judgment
BNCOD'06 Proceedings of the 23rd British National Conference on Databases, conference on Flexible and Efficient Information Handling
Evaluation of system measures for incomplete relevance judgment in IR
FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
A multimedia retrieval framework based on automatic graded relevance judgments
MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Learning binary codes for collaborative filtering
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Top-k learning to rank: labeling, ranking and evaluation
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Journal of Web Engineering
Evaluation in Music Information Retrieval
Journal of Intelligent Information Systems
Hi-index | 0.00 |
In this study the rankings of IR systems based on binary and graded relevance in TREC 7 and 8 data are compared. Relevance of a sample TREC results is reassessed using a relevance scale with four levels: non-relevant, marginally relevant, fairly relevant, highly relevant. Twenty-one topics and 90 systems from TREC 7 and 20 topics and 121 systems from TREC 8 form the data. Binary precision, and cumulated gain, discounted cumulated gain and normalised discounted cumulated gain are the measures compared. Different weighting schemes for relevance levels are tested with cumulated gain measures. Kendall's rank correlations are computed to determine to what extent the rankings produced by different measures are similar. Weighting schemes from binary to emphasising highly relevant documents form a continuum, where the measures correlate strongly in the binary end, and less in the heavily weighted end. The results show the different character of the measures.