The state of retrieval system evaluation
Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Presenting results of experimental retrieval comparisons
Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Overview of the sixth text REtrieval conference (TREC-6)
Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness
Information Processing and Management: an International Journal
Blind Men and Elephants: Six Approaches to TREC data
Information Retrieval
Indexing for fast categorisation
ACSC '03 Proceedings of the 26th Australasian computer science conference - Volume 16
Searching XML documents via XML fragments
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Query expansion using associated queries
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Cross-Language Evaluation Forum: Objectives, Results, Achievements
Information Retrieval
Measuring retrieval effectiveness: a new proposal and a first experimental validation
Journal of the American Society for Information Science and Technology
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
On evaluating web search with very few relevant documents
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Multi-Dimensional Evaluation of Information Retrieval Results
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
The effect of named entities on effectiveness in cross-language information retrieval evaluation
Proceedings of the 2005 ACM symposium on Applied computing
The TREC robust retrieval track
ACM SIGIR Forum
Evaluating the evaluation: a case study using the TREC 2002 question answering track
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Information retrieval system evaluation: effort, sensitivity, and reliability
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation of resources for question answering evaluation
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Testing algorithms is like testing students
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Revisiting the effect of topic set size on retrieval error
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Redundant documents and search effectiveness
Proceedings of the 14th ACM international conference on Information and knowledge management
Building a reusable test collection for question answering
Journal of the American Society for Information Science and Technology - Research Articles
Dynamic test collections: measuring search effectiveness on the live web
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation metrics based on the bootstrap
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical precision of information retrieval evaluation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Give me just one highly relevant document: P-measure
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
On GMAP: and other transformations
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Investigating the exhaustivity dimension in content-oriented XML element retrieval evaluation
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval
ACM Transactions on Information Systems (TOIS)
Adapting pivoted document-length normalization for query size: Experiments in Chinese and English
ACM Transactions on Asian Language Information Processing (TALIP)
Using question series to evaluate question answering system effectiveness
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
On the reliability of factoid question answering evaluation
ACM Transactions on Asian Language Information Processing (TALIP)
On the reliability of information retrieval metrics based on graded relevance
Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Test theory for assessing IR test collections
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Hits hits TREC: exploring IR evaluation results with network analysis
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Repeatable evaluation of search services in dynamic environments
ACM Transactions on Information Systems (TOIS)
Information Processing and Management: an International Journal
Re-examining the effects of adding relevance information in a relevance feedback environment
Information Processing and Management: an International Journal
How robust are multilingual information retrieval systems?
Proceedings of the 2008 ACM symposium on Applied computing
Precision-at-ten considered redundant
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
On test collections for adaptive information retrieval
Information Processing and Management: an International Journal
Statistical power in retrieval experimentation
Proceedings of the 17th ACM conference on Information and knowledge management
A New Shape Benchmark for 3D Object Retrieval
ISVC '08 Proceedings of the 4th International Symposium on Advances in Visual Computing
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
IR Evaluation without a Common Set of Topics
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
A few good topics: Experiments in topic set reduction for retrieval evaluation
ACM Transactions on Information Systems (TOIS)
DUC 2005: evaluation of question-focused summarization systems
SumQA '06 Proceedings of the Workshop on Task-Focused Summarization and Question Answering
So many topics, so little time
ACM SIGIR Forum
Evaluation of automatic summaries: metrics under varying data conditions
UCNLG+Sum '09 Proceedings of the 2009 Workshop on Language Generation and Summarisation
Comparing the sensitivity of information retrieval metrics
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Ranking related entities: components and analyses
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
GikiCLEF: expectations and lessons learned
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
CLEF-IP 2009: retrieval experiments in the intellectual property domain
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Boiling down information retrieval test collections
RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Evaluation effort, reliability and reusability in XML retrieval
Journal of the American Society for Information Science and Technology
On the contributions of topics to system evaluation
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
A simple measure to assess non-response
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Evaluating diversified search results using per-intent graded relevance
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Quantifying test collection quality based on the consistency of relevance judgements
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Selecting a subset of queries for acquisition of further relevance judgements
ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Large-scale validation and analysis of interleaved search evaluation
ACM Transactions on Information Systems (TOIS)
The reliability of metrics based on graded relevance
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
On effectiveness measures and relevance functions in ranking INEX systems
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Assessing effectiveness in video retrieval
CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval
Bootstrap-Based comparisons of IR metrics for finding one relevant document
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
CLEF 2004: ad hoc track overview and results analysis
CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
How do named entities contribute to retrieval effectiveness?
CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
Measuring the variability in effectiveness of a retrieval system
IRFC'10 Proceedings of the First international Information Retrieval Facility conference on Adbances in Multidisciplinary Retrieval
Information retrieval evaluation with partial relevance judgment
BNCOD'06 Proceedings of the 23rd British National Conference on Databases, conference on Flexible and Efficient Information Handling
Evaluation of system measures for incomplete relevance judgment in IR
FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
Benchmarks, performance evaluation and contests for 3D shape retrieval
Proceedings of the 10th Performance Metrics for Intelligent Systems Workshop
Using XML logical structure to retrieve (multimedia)
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Differences in effectiveness across sub-collections
Proceedings of the 21st ACM international conference on Information and knowledge management
Evaluating question answering validation as a classification problem
Language Resources and Evaluation
PROMISE'12 Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization
Aggregating evidence from hospital departments to improve medical records search
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Leading people to longer queries
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
On the measurement of test collection reliability
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Axiometrics: An Axiomatic Approach to Information Retrieval Effectiveness Metrics
Proceedings of the 2013 Conference on the Theory of Information Retrieval
On Using Fewer Topics in Information Retrieval Evaluations
Proceedings of the 2013 Conference on the Theory of Information Retrieval
Evaluation as a service for information retrieval
ACM SIGIR Forum
Evaluation in Music Information Retrieval
Journal of Intelligent Information Systems
Hi-index | 0.00 |
Retrieval mechanisms are frequently compared by computing the respective average scores for some effectiveness metric across a common set of information needs or topics, with researchers concluding one method is superior based on those averages. Since comparative retrieval system behavior is known to be highly variable across topics, good experimental design requires that a "sufficient" number of topics be used in the test. This paper uses TREC results to empirically derive error rates based on the number of topics used in a test and the observed difference in the average scores. The error rates quantify the likelihood that a different set of topics of the same size would lead to a different conclusion. We directly compute error rates for topic sets up to size 25, and extrapolate those rates for larger topic set sizes. The error rates found are larger than anticipated, indicating researchers need to take care when concluding one method is better than another, especially if few topics are used.