Binary and graded relevance in IR evaluations-Comparison of the effects on ranking of IR systems

Authors:
Jaana Kekäläinen
Affiliations:
Department of Information Studies, FIN-33014 University of Tampere, Finland
Venue:
Information Processing and Management: an International Journal
Year:
2005

Citing 16
Cited 19

A re-examination of relevance: toward a dynamic, situational definition

Information Processing and Management: an International Journal
Variations in relevance judgments and the evaluation of retrieval performance

Information Processing and Management: an International Journal
Presenting results of experimental retrieval comparisons

Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Using statistical testing in the evaluation of retrieval experiments

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
User-defined relevance criteria: an exploratory study

Journal of the American Society for Information Science - Special issue: relevance research
Variations in relevance judgments and the measurement of retrieval effectiveness

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Measures of relative relevance and ranked half-life: performance indicators for interactive IR

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Towards the identification of the optimal number of relevance categories

Journal of the American Society for Information Science
Evaluating evaluation measure stability

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Dimensions of relevance

Information Processing and Management: an International Journal
Variations in relevance judgments and the measurement of retrieval effectiveness

Information Processing and Management: an International Journal
Evaluation by highly relevant documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Liberal relevance criteria of TREC -: counting on negligible documents?

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Average gain ratio: a simple retrieval performance measure for evaluation with multiple relevance levels

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

Cost-sensitive supported vector learning to rank imbalanced data set

ICIC'09 Proceedings of the Intelligent computing 5th international conference on Emerging intelligent computing technology and applications
Ordinal regression with sparse Bayesian

ICIC'09 Proceedings of the Intelligent computing 5th international conference on Emerging intelligent computing technology and applications
Discounted cumulated gain based evaluation of multiple-query IR sessions

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Extending average precision to graded relevance judgments

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Reconsideration of the simulated work task situation: a context instrument for evaluation of information retrieval interaction

Proceedings of the third symposium on Information interaction in context
Assessors' search result satisfaction associated with relevance in a scientific domain

Proceedings of the third symposium on Information interaction in context
PinDr0p: using single-ended audio features to determine call provenance

Proceedings of the 17th ACM conference on Computer and communications security
The reliability of metrics based on graded relevance

AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Bootstrap-Based comparisons of IR metrics for finding one relevant document

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Evaluating scalability in information retrieval with multigraded relevance

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Diversity-aware evaluation for paraphrase patterns

TIWTE '11 Proceedings of the TextInfer 2011 Workshop on Textual Entailment
Information retrieval evaluation with partial relevance judgment

BNCOD'06 Proceedings of the 23rd British National Conference on Databases, conference on Flexible and Efficient Information Handling
Evaluation of system measures for incomplete relevance judgment in IR

FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
The effects of relevance feedback quality and quantity in interactive relevance feedback: a simulation based on user modeling

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
A multimedia retrieval framework based on automatic graded relevance judgments

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Learning binary codes for collaborative filtering

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Top-k learning to rank: labeling, ranking and evaluation

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
A survey of faceted search

Journal of Web Engineering
Evaluation in Music Information Retrieval

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this study the rankings of IR systems based on binary and graded relevance in TREC 7 and 8 data are compared. Relevance of a sample TREC results is reassessed using a relevance scale with four levels: non-relevant, marginally relevant, fairly relevant, highly relevant. Twenty-one topics and 90 systems from TREC 7 and 20 topics and 121 systems from TREC 8 form the data. Binary precision, and cumulated gain, discounted cumulated gain and normalised discounted cumulated gain are the measures compared. Different weighting schemes for relevance levels are tested with cumulated gain measures. Kendall's rank correlations are computed to determine to what extent the rankings produced by different measures are similar. Weighting schemes from binary to emphasising highly relevant documents form a continuum, where the measures correlate strongly in the binary end, and less in the heavily weighted end. The results show the different character of the measures.