IR Evaluation without a Common Set of Topics

Authors:
Matteo Cattelan;Stefano Mizzaro
Affiliations:
Dept. of Mathematics and Computer Science, University of Udine, Udine, Italy 33100;Dept. of Mathematics and Computer Science, University of Udine, Udine, Italy 33100
Venue:
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Year:
2009

Citing 10
Cited 2

How reliable are the results of large-scale information retrieval experiments?

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation measure stability

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
The effect of topic set size on retrieval experiment error

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval system evaluation: effort, sensitivity, and reliability

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Hits hits TREC: exploring IR evaluation results with network analysis

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Score standardization for inter-collection comparison of retrieval systems

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Crowdsourcing for relevance evaluation

ACM SIGIR Forum
Relevance criteria for e-commerce: a crowdsourcing-based experimental analysis

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A few good topics: Experiments in topic set reduction for retrieval evaluation

ACM Transactions on Information Systems (TOIS)
The good, the bad, the difficult, and the easy: something wrong with information retrieval evaluation?

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval

On the contributions of topics to system evaluation

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
On Using Fewer Topics in Information Retrieval Evaluations

Proceedings of the 2013 Conference on the Theory of Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Usually, system effectiveness evaluation in a TREC-like environment is performed on a common set of topics. We show that even when using different topics for different systems, a reliable evaluation can be obtained, and that reliability increases by using appropriate topic selection strategies and metric normalizations.