The state of retrieval system evaluation
Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
The pragmatics of information retrieval experimentation, revisited
Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Presenting results of experimental retrieval comparisons
Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Using statistical testing in the evaluation of retrieval experiments
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating and optimizing autonomous text classification systems
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
On selecting a measure of retrieval effectiveness. Part I.
Readings in information retrieval
21st Annual ACM/SIGIR International Conference on Research and Development in Information Retrieval
Efficient construction of large test collections
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Overview of the sixth text REtrieval conference (TREC-6)
Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
Information Retrieval
Evaluation by highly relevant documents
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Generic summaries for indexing in information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Why batch and user evaluations do not give the same results
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Improved retrieval effectiveness through impact transformation
ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
Impact transformation: effective and efficient web retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
The effect of topic set size on retrieval experiment error
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation in information retrieval
Lectures on information retrieval
A compact and efficient image retrieval approach based on border/interior pixel classification
Proceedings of the eleventh international conference on Information and knowledge management
Measuring Search Engine Quality
Information Retrieval
Introduction to the Special Issue: Overview of the TREC Routing and Filtering Tasks
Information Retrieval
Comparing the Performance of Adaptive Filtering and Ranked Output Systems
Information Retrieval
Evaluation of Text Retrieval Systems
Programming and Computing Software
ACM Transactions on Information Systems (TOIS)
Long-Term Learning for Web Search Engines
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Evaluation in Information Retrieval
ESSIR '00 Proceedings of the Third European Summer-School on Lectures on Information Retrieval-Revised Lectures
Analysis of performance variation using query expansion
Journal of the American Society for Information Science and Technology
Using manually-built web directories for automatic evaluation of known-item retrieval
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Using titles and category names from editor-driven taxonomies for automatic evaluation
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Measuring retrieval effectiveness: a new proposal and a first experimental validation
Journal of the American Society for Information Science and Technology
Dynamic Composition of Information Retrieval Techniques
Journal of Intelligent Information Systems
Scaling IR-system evaluation using term relevance sets
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Building an information retrieval test collection for spontaneous conversational speech
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
The robustness of content-based search in hierarchical peer to peer networks
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Client-system collaboration for legal corpus selection in an online production environment
ICAIL '03 Proceedings of the 9th international conference on Artificial intelligence and law
Information Retrieval
A framework for determining necessary query set sizes to evaluate web search effectiveness
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Binary and graded relevance in IR evaluations: comparison of the effects on ranking of IR systems
Information Processing and Management: an International Journal
The maximum entropy method for analyzing retrieval measures
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval system evaluation: effort, sensitivity, and reliability
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Simplified similarity scoring using term ranks
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation of resources for question answering evaluation
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A geometric interpretation of r-precision and its correlation with average precision
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Surrogate scoring for improved metasearch precision
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Revisiting the effect of topic set size on retrieval error
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A geometric interpretation and analysis of R-precision
Proceedings of the 14th ACM international conference on Information and knowledge management
Proceedings of the 14th ACM international conference on Information and knowledge management
Redundant documents and search effectiveness
Proceedings of the 14th ACM international conference on Information and knowledge management
Recommended reading for IR research students
ACM SIGIR Forum
Automatic ranking of information retrieval systems using data fusion
Information Processing and Management: an International Journal
Mining Adaptive Ratio Rules from Distributed Data Sources
Data Mining and Knowledge Discovery
Building a reusable test collection for question answering
Journal of the American Society for Information Science and Technology - Research Articles
Managing déjà vu: Collection building for the identification of nonidentical duplicate documents
Journal of the American Society for Information Science and Technology - Research Articles
User modelling using evolutionary interactive reinforcement learning
Information Retrieval
User performance versus precision measures for simple search tasks
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation in (XML) information retrieval: expected precision-recall with user modelling (EPRUM)
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal test collections for retrieval evaluation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation metrics based on the bootstrap
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical precision of information retrieval evaluation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Give me just one highly relevant document: P-measure
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
On GMAP: and other transformations
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Estimating average precision with incomplete and imperfect judgments
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval
ACM Transactions on Information Systems (TOIS)
Creating a test collection for citation-based IR experiments
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
A machine learning based approach to evaluating retrieval systems
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
On the reliability of factoid question answering evaluation
ACM Transactions on Asian Language Information Processing (TALIP)
The phrase-based vector space model for automatic retrieval of free-text medical documents
Data & Knowledge Engineering
On the reliability of information retrieval metrics based on graded relevance
Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
On rank-based effectiveness measures and optimization
Information Retrieval
On the robustness of relevance measures with incomplete judgments
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Test theory for assessing IR test collections
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Strategic system comparisons via targeted relevance judgments
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Hits hits TREC: exploring IR evaluation results with network analysis
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
How well does result relevance predict session satisfaction?
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A new approach for evaluating query expansion: query-document term mismatch
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Workload sampling for enterprise search evaluation
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Robust techniques for organizing and retrieving spoken documents
EURASIP Journal on Applied Signal Processing
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Word and sub-word indexing approaches for reducing the effects of OOV queries on spoken audio
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Repeatable evaluation of search services in dynamic environments
ACM Transactions on Information Systems (TOIS)
Information Processing and Management: an International Journal
A strategy for allowing meaningful and comparable scores in approximate matching
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Semantic components enhance retrieval of domain-specific documents
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Evolved term-weighting schemes in Information Retrieval: an analysis of the solution space
Artificial Intelligence Review
Using information gain to improve multi-modal information retrieval systems
Information Processing and Management: an International Journal
Incremental cluster-based retrieval using compressed cluster-skipping inverted files
ACM Transactions on Information Systems (TOIS)
Score standardization for inter-collection comparison of retrieval systems
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
The good and the bad system: does the test collection predict users' effectiveness?
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval sensitivity under training using different measures
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Precision-at-ten considered redundant
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
The Simplest XML Retrieval Baseline That Could Possibly Work
Focused Access to XML Documents
On test collections for adaptive information retrieval
Information Processing and Management: an International Journal
Comparing metrics across TREC and NTCIR: the robustness to system bias
Proceedings of the 17th ACM conference on Information and knowledge management
Local search: A guide for the information retrieval practitioner
Information Processing and Management: an International Journal
Artificial Intelligence Review
A New Shape Benchmark for 3D Object Retrieval
ISVC '08 Proceedings of the 4th International Symposium on Advances in Visual Computing
An Ontology-Based Framework for Knowledge Retrieval
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Empirical Software Engineering
Query expansion with a medical ontology to improve a multimodal information retrieval system
Computers in Biology and Medicine
Possibilistic networks for information retrieval
International Journal of Approximate Reasoning
Using argumentation to retrieve articles with similar citations from MEDLINE
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Topic (query) selection for IR evaluation
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Visualizing the problems with the INEX topics
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A strategy for allowing meaningful and comparable scores in approximate matching
Information Systems
A strategy for allowing meaningful and comparable scores in approximate matching
Information Systems
IR Evaluation without a Common Set of Topics
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Graph Matching Algorithms for Business Process Model Similarity Search
BPM '09 Proceedings of the 7th International Conference on Business Process Management
A few good topics: Experiments in topic set reduction for retrieval evaluation
ACM Transactions on Information Systems (TOIS)
Exploiting Disambiguation and Discrimination in Information Retrieval Systems
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Empirical justification of the gain and discount function for nDCG
Proceedings of the 18th ACM conference on Information and knowledge management
A personalized recommender system for digital libraries
Proceedings of the 14th Brazilian Symposium on Multimedia and the Web
Indexing and searching strategies for the Russian language
Journal of the American Society for Information Science and Technology
Exploiting bilingual information to improve web search
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Metric and Relevance Mismatch in Retrieval Evaluation
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Binary and graded relevance in IR evaluations-Comparison of the effects on ranking of IR systems
Information Processing and Management: an International Journal
Modelling field dependencies on structured documents with fuzzy logic
FUZZ-IEEE'09 Proceedings of the 18th international conference on Fuzzy Systems
Volumetric Features for Video Event Detection
International Journal of Computer Vision
Evaluating information retrieval system performance based on user preference
Journal of Intelligent Information Systems
On statistical analysis and optimization of information retrieval effectiveness metrics
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
PRES: a score metric for evaluating recall-oriented information retrieval applications
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A knowledge-based model using ontologies for personalized web information gathering
Web Intelligence and Agent Systems
Contextualizing semantic representations using syntactically enriched vector models
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Pattern based keyword extraction for contextual advertising
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A vector space analysis of swedish patent claims with different linguistic indices
PaIR '10 Proceedings of the 3rd international workshop on Patent information retrieval
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Similarity of business process models: Metrics and evaluation
Information Systems
Structure vs. content in hierarchical corpora
Information Retrieval
Research methodology in studies of assessor effort for information retrieval evaluation
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
The influence of commercial intent of search results on their perceived relevance
Proceedings of the 2011 iConference
Using clustering to improve retrieval evaluation without relevance judgments
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Fast business process similarity search with feature-based similarity estimation
OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems - Volume Part I
Evaluation effort, reliability and reusability in XML retrieval
Journal of the American Society for Information Science and Technology
On the informativeness of cascade and intent-aware effectiveness measures
Proceedings of the 20th international conference on World wide web
Latent semantic indexing (LSI) fails for TREC collections
ACM SIGKDD Explorations Newsletter
Exploring the music similarity space on the web
ACM Transactions on Information Systems (TOIS)
Selecting vantage objects for similarity indexing
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
A simple measure to assess non-response
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Identification and treatment of multiword expressions applied to information retrieval
MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Behavioral similarity: a proper metric
BPM'11 Proceedings of the 9th international conference on Business process management
Diverse retrieval via greedy optimization of expected 1-call@k in a latent subtopic relevance model
Proceedings of the 20th ACM international conference on Information and knowledge management
Principles for robust evaluation infrastructure
Proceedings of the 2011 workshop on Data infrastructurEs for supporting information retrieval evaluation
Using the euclidean distance for retrieval evaluation
BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
Leveraging web services discovery with customizable hybrid matching
ICSOC'06 Proceedings of the 4th international conference on Service-Oriented Computing
Multiple testing in statistical analysis of systems-based information retrieval experiments
ACM Transactions on Information Systems (TOIS)
The reliability of metrics based on graded relevance
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
On effectiveness measures and relevance functions in ranking INEX systems
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Bootstrap-Based comparisons of IR metrics for finding one relevant document
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Adaptive query-based sampling of distributed collections
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Space-Limited ranked query evaluation using adaptive pruning
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Ranking fusion methods applied to on-line handwriting information retrieval
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Exploring cost-effective approaches to human evaluation of search engine relevance
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Information retrieval evaluation with partial relevance judgment
BNCOD'06 Proceedings of the 23rd British National Conference on Databases, conference on Flexible and Efficient Information Handling
Evaluation of system measures for incomplete relevance judgment in IR
FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
Fast discovery of similar sequences in large genomic collections
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Measuring the ability of score distributions to model relevance
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Fast business process similarity search
Distributed and Parallel Databases
On smoothing average precision
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
On aggregating labels from multiple crowd workers to infer relevance of documents
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Experimental methods for information retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
A pattern discovery model for effective text mining
MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Discovering relevant features for effective query formulation
IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
Journal of the American Society for Information Science and Technology
Evaluating question answering validation as a classification problem
Language Resources and Evaluation
Measuring the coverage and redundancy of information search services on e-commerce platforms
Electronic Commerce Research and Applications
Mining a multilingual association dictionary from Wikipedia for cross-language information retrieval
Journal of the American Society for Information Science and Technology
Adopting relevance feature to learn personalized ontologies
AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
PROMISE'12 Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization
Scoring-Thresholding pattern based text classifier
ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
Matching Relevance Features with Ontological Concepts
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Journal of Web Engineering
On the measurement of test collection reliability
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Axiometrics: An Axiomatic Approach to Information Retrieval Effectiveness Metrics
Proceedings of the 2013 Conference on the Theory of Information Retrieval
On Using Fewer Topics in Information Retrieval Evaluations
Proceedings of the 2013 Conference on the Theory of Information Retrieval
Maintaining discriminatory power in quantized indexes
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
A pattern based two-stage text classifier
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
The whens and hows of learning to rank for web search
Information Retrieval
Document Score Distribution Models for Query Performance Inference and Prediction
ACM Transactions on Information Systems (TOIS)
Text mining in negative relevance feedback
Web Intelligence and Agent Systems
Evaluation in Music Information Retrieval
Journal of Intelligent Information Systems
Semantic concept-enriched dependence model for medical information retrieval
Journal of Biomedical Informatics
Hi-index | 0.00 |
This paper presents a novel way of examining the accuracy of the evaluation measures commonly used in information retrieval experiments. It validates several of the rules-of-thumb experimenters use, such as the number of queries needed for a good experiment is at least 25 and 50 is better, while challenging other beliefs, such as the common evaluation measures are equally reliable. As an example, we show that Precision at 30 documents has about twice the average error rate as Average Precision has. These results can help information retrieval researchers design experiments that provide a desired level of confidence in their results. In particular, we suggest researchers using Web measures such as Precision at 10 documents will need to use many more than 50 queries or will have to require two methods to have a very large difference in evaluation scores before concluding that the two methods are actually different.