A few good topics: Experiments in topic set reduction for retrieval evaluation

Authors:
John Guiver;Stefano Mizzaro;Stephen Robertson
Affiliations:
Microsoft Research Cambridge, Cambridge, UK;University of Udine, Udine, Italy;Microsoft Research Cambridge, Cambridge, UK
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2009

Citing 15
Cited 13

How reliable are the results of large-scale information retrieval experiments?

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation measure stability

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Blind Men and Elephants: Six Approaches to TREC data

Information Retrieval
The effect of topic set size on retrieval experiment error

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval system evaluation: effort, sensitivity, and reliability

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

SIGIR '06 The 29th Annual International SIGIR Conference
Minimal test collections for retrieval evaluation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation metrics based on the bootstrap

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 15th ACM international conference on Information and knowledge management

Conference on Information and Knowledge Management
On GMAP: and other transformations

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Estimating average precision with incomplete and imperfect judgments

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
On the reliability of information retrieval metrics based on graded relevance

Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
Hits hits TREC: exploring IR evaluation results with network analysis

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Precision-at-ten considered redundant

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Statistical power in retrieval experimentation

Proceedings of the 17th ACM conference on Information and knowledge management

IR Evaluation without a Common Set of Topics

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Relying on topic subsets for system ranking estimation

Proceedings of the 18th ACM conference on Information and knowledge management
Boiling down information retrieval test collections

RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
On the contributions of topics to system evaluation

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Selecting a subset of queries for acquisition of further relevance judgements

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Query sampling for learning data fusion

Proceedings of the 20th ACM international conference on Information and knowledge management
Prioritizing relevance judgments to improve the construction of IR test collections

Proceedings of the 20th ACM international conference on Information and knowledge management
Optimizing the cost of information retrieval testcollections

Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
A case for automatic system evaluation

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
An uncertainty-aware query selection model for evaluation of IR systems

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Using crowdsourcing for TREC relevance assessment

Information Processing and Management: an International Journal
On Using Fewer Topics in Information Retrieval Evaluations

Proceedings of the 2013 Conference on the Theory of Information Retrieval
Evaluation in Music Information Retrieval

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the issue of evaluating information retrieval systems on the basis of a limited number of topics. In contrast to statistically-based work on sample sizes, we hypothesize that some topics or topic sets are better than others at predicting true system effectiveness, and that with the right choice of topics, accurate predictions can be obtained from small topics sets. Using a variety of effectiveness metrics and measures of goodness of prediction, a study of a set of TREC and NTCIR results confirms this hypothesis, and provides evidence that the value of a topic set for this purpose does generalize.