A critical investigation of recall and precision as measures of retrieval system performance
ACM Transactions on Information Systems (TOIS)
The state of retrieval system evaluation
Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
The pragmatics of information retrieval experimentation, revisited
Information Processing and Management: an International Journal - Special issue on evaluation issues in information retrieval
Overview of the second text retrieval conference (TREC-2)
TREC-2 Proceedings of the second conference on Text retrieval conference
Efficient retrieval of partial documents
TREC-2 Proceedings of the second conference on Text retrieval conference
Relevance judgments for assessing recall
Information Processing and Management: an International Journal
STAIRS redux: thoughts on the STAIRS evaluation, ten years after
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Variations in relevance assessments and the measurement of retrieval effectiveness
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Statistical inference in retrieval effectiveness evaluation
Information Processing and Management: an International Journal
Efficient construction of large test collections
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Vector-space ranking with effective early termination
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Ranking retrieval systems without relevance judgments
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
European Research Letter: cross-language system evaluation: the CLEF campaigns
Journal of the American Society for Information Science and Technology
Improved retrieval effectiveness through impact transformation
ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
Liberal relevance criteria of TREC -: counting on negligible documents?
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Query association for effective retrieval
Proceedings of the eleventh international conference on Information and knowledge management
Evaluation of Text Retrieval Systems
Programming and Computing Software
Some thoughts on the reported results of TREC
Information Processing and Management: an International Journal
CLEF 2000 - Overview of Results
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Information Processing and Management: an International Journal
Methods for identifying versioned and plagiarized documents
Journal of the American Society for Information Science and Technology
Interactive Visualization of Multiple Query Results
INFOVIS '01 Proceedings of the IEEE Symposium on Information Visualization 2001 (INFOVIS'01)
Building a filtering test collection for TREC 2002
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Current Status of the Evaluation of Information Retrieval
Journal of Medical Systems
Methods for ranking information retrieval systems without relevance judgments
Proceedings of the 2003 ACM symposium on Applied computing
Query expansion using associated queries
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A unified model for metasearch, pooling, and system evaluation
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Cross-Language Evaluation Forum: Objectives, Results, Achievements
Information Retrieval
ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Measuring retrieval effectiveness: a new proposal and a first experimental validation
Journal of the American Society for Information Science and Technology
The effectiveness of automatically structured queries in digital libraries
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Scaling IR-system evaluation using term relevance sets
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Forming test collections with no system pooling
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
The effect of named entities on effectiveness in cross-language information retrieval evaluation
Proceedings of the 2005 ACM symposium on Applied computing
Information retrieval system evaluation: effort, sensitivity, and reliability
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation of resources for question answering evaluation
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Set-based vector model: An efficient approach for correlation-based ranking
ACM Transactions on Information Systems (TOIS)
Proceedings of the 14th ACM international conference on Information and knowledge management
Recommended reading for IR research students
ACM SIGIR Forum
The text retrieval conferences (TRECS)
TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
Automatic ranking of information retrieval systems using data fusion
Information Processing and Management: an International Journal
Building a reusable test collection for question answering
Journal of the American Society for Information Science and Technology - Research Articles
ACM SIGIR Forum
User performance versus precision measures for simple search tasks
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Minimal test collections for retrieval evaluation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Dynamic test collections: measuring search effectiveness on the live web
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical precision of information retrieval evaluation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A statistical method for system evaluation using incomplete judgments
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Bias and the limits of pooling
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Usage-oriented multimedia information retrieval technological evaluation
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
On the significance of cluster-temporal browsing for generic video retrieval: a statistical analysis
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Estimating average precision with incomplete and imperfect judgments
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A machine learning based approach to evaluating retrieval systems
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Efficient query expansion with auxiliary data structures
Information Systems
Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied
Information Processing and Management: an International Journal
Information Processing and Management: an International Journal
Methodologies for Evaluation of Note-Based Music-Retrieval Systems
INFORMS Journal on Computing
Argumentative feedback: a linguistically-motivated term expansion for information retrieval
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Robust test collections for retrieval evaluation
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Reliable information retrieval evaluation with incomplete and biased judgements
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Deconstructing nuggets: the stability and reliability of complex question answering evaluation
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
On the robustness of relevance measures with incomplete judgments
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Test theory for assessing IR test collections
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Strategic system comparisons via targeted relevance judgments
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Repeatable evaluation of search services in dynamic environments
ACM Transactions on Information Systems (TOIS)
Information Processing and Management: an International Journal
Stemming Indonesian: A confix-stripping approach
ACM Transactions on Asian Language Information Processing (TALIP)
Inferring document relevance from incomplete information
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Hypothesis testing with incomplete relevance judgments
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Evaluation of phrasal query suggestions
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Evaluating epistemic uncertainty under incomplete assessments
Information Processing and Management: an International Journal
Re-examining the effects of adding relevance information in a relevance feedback environment
Information Processing and Management: an International Journal
Assessing multivariate Bernoulli models for information retrieval
ACM Transactions on Information Systems (TOIS)
Score standardization for inter-collection comparison of retrieval systems
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Predicting information seeker satisfaction in community question answering
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation over thousands of queries
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Comparing metrics across TREC and NTCIR:: the robustness to pool depth bias
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Estimating average precision when judgments are incomplete
Knowledge and Information Systems
Sound and complete relevance assessment for XML retrieval
ACM Transactions on Information Systems (TOIS)
Rank-biased precision for measurement of retrieval effectiveness
ACM Transactions on Information Systems (TOIS)
The Simplest XML Retrieval Baseline That Could Possibly Work
Focused Access to XML Documents
A Comparison of Interactive and Ad-Hoc Relevance Assessments
Focused Access to XML Documents
Revisiting the relationship between document length and relevance
Proceedings of the 17th ACM conference on Information and knowledge management
Statistical power in retrieval experimentation
Proceedings of the 17th ACM conference on Information and knowledge management
Comparing metrics across TREC and NTCIR: the robustness to system bias
Proceedings of the 17th ACM conference on Information and knowledge management
Experiments with English-Persian text retrieval
Proceedings of the 2nd ACM workshop on Improving non english web searching
Enriching a Thesaurus to Improve Retrieval of Audiovisual Documents
SAMT '08 Proceedings of the 3rd International Conference on Semantic and Digital Media Technologies: Semantic Multimedia
Comparative analysis of clicks and judgments for IR evaluation
Proceedings of the 2009 workshop on Web Search Click Data
Modeling information-seeker satisfaction in community question answering
ACM Transactions on Knowledge Discovery from Data (TKDD)
Hamshahri: A standard Persian text collection
Knowledge-Based Systems
PSkip: estimating relevance ranking quality from web search clickthrough data
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Towards methods for the collective gathering and quality control of relevance assessments
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Document selection methodologies for efficient and effective learning-to-rank
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Topic (query) selection for IR evaluation
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Implementing and evaluating phrasal query suggestions for proximity search
Information Systems
Implementing and evaluating phrasal query suggestions for proximity search
Information Systems
IR Evaluation without a Common Set of Topics
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Concept-based feature generation and selection for information retrieval
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
A few good topics: Experiments in topic set reduction for retrieval evaluation
ACM Transactions on Information Systems (TOIS)
Improvements that don't add up: ad-hoc retrieval results since 1998
Proceedings of the 18th ACM conference on Information and knowledge management
So many topics, so little time
ACM SIGIR Forum
Measuring the reusability of test collections
Proceedings of the third ACM international conference on Web search and data mining
Click-based evidence for decaying weight distributions in search effectiveness metrics
Information Retrieval
Variation of relevance assessments for medical image retrieval
AMR'06 Proceedings of the 4th international conference on Adaptive multimedia retrieval: user, context, and feedback
A retrieval evaluation methodology for incomplete relevance assessments
ECIR'07 Proceedings of the 29th European conference on IR research
ECIR'07 Proceedings of the 29th European conference on IR research
ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries
Sampling precision to depth 10000 at CLEF 2008
CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Overview of the ImageCLEFmed 2008 medical image retrieval task
CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Thesaurus enrichment for query expansion in audiovisual archives
Multimedia Tools and Applications
The effect of assessor error on IR system evaluation
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Reusable test collections through experimental design
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
The FIRE 2008 Evaluation Exercise
ACM Transactions on Asian Language Information Processing (TALIP)
On the potential search effectiveness of MeSH (medical subject headings) terms
Proceedings of the third symposium on Information interaction in context
Recommendation in Internet forums and blogs
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Score aggregation techniques in retrieval experimentation
ADC '09 Proceedings of the Twentieth Australasian Conference on Australasian Database - Volume 92
User comments for news recommendation in forum-based social media
Information Sciences: an International Journal
Assessor error in stratified evaluation
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Retrieval result presentation and evaluation
KSEM'10 Proceedings of the 4th international conference on Knowledge science, engineering and management
Sampling precision to depth 10000 at CLEF 2009
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
GikiCLEF: expectations and lessons learned
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Tie-breaking bias: effect of an uncontrolled parameter on information retrieval evaluation
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Research methodology in studies of assessor effort for information retrieval evaluation
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Boiling down information retrieval test collections
RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Using clustering to improve retrieval evaluation without relevance judgments
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Evaluation effort, reliability and reusability in XML retrieval
Journal of the American Society for Information Science and Technology
Diagnostic Evaluation of Information Retrieval Models
ACM Transactions on Information Systems (TOIS)
Concept-Based Information Retrieval Using Explicit Semantic Analysis
ACM Transactions on Information Systems (TOIS)
Evaluation of information retrieval for E-discovery
Artificial Intelligence and Law
Proceedings of the 16th annual joint conference on Innovation and technology in computer science education
Quantifying test collection quality based on the consistency of relevance judgements
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Selecting optimal training data for learning to rank
Information Processing and Management: an International Journal
Model-based inference about IR systems
ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Selecting a subset of queries for acquisition of further relevance judgements
ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Prioritizing relevance judgments to improve the construction of IR test collections
Proceedings of the 20th ACM international conference on Information and knowledge management
CoDet: sentence-based containment detection in news corpora
Proceedings of the 20th ACM international conference on Information and knowledge management
Evaluating large-scale distributed vertical search
Proceedings of the 9th workshop on Large-scale and distributed informational retrieval
Optimizing the cost of information retrieval testcollections
Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
Using the euclidean distance for retrieval evaluation
BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
An overview of Web search evaluation methods
Computers and Electrical Engineering
Multiple testing in statistical analysis of systems-based information retrieval experiments
ACM Transactions on Information Systems (TOIS)
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
A fuzzy ranking approach for improving search results in Turkish as an agglutinative language
Expert Systems with Applications: An International Journal
Automated object extraction for medical image retrieval using the insight toolkit (ITK)
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
IR system evaluation using nugget-based test collections
Proceedings of the fifth ACM international conference on Web search and data mining
Exploring cost-effective approaches to human evaluation of search engine relevance
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Scalability influence on retrieval models: an experimental methodology
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Information retrieval evaluation with partial relevance judgment
BNCOD'06 Proceedings of the 23rd British National Conference on Databases, conference on Flexible and Efficient Information Handling
Evaluation of system measures for incomplete relevance judgment in IR
FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
Retrieval status values in information retrieval evaluation
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Combining inverted indices and structured search for ad-hoc object retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
An uncertainty-aware query selection model for evaluation of IR systems
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Applying relevance feedback for retrieving web-page retrieval
CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Approximate Recall Confidence Intervals
ACM Transactions on Information Systems (TOIS)
PROMISE'12 Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization
On the measurement of test collection reliability
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
On Using Fewer Topics in Information Retrieval Evaluations
Proceedings of the 2013 Conference on the Theory of Information Retrieval
A new statistical strategy for pooling: ELI
Information Processing Letters
Choices in batch information retrieval evaluation
Proceedings of the 18th Australasian Document Computing Symposium
The whens and hows of learning to rank for web search
Information Retrieval
Evaluation in Music Information Retrieval
Journal of Intelligent Information Systems
Hi-index | 0.00 |
Two stages in measurement of techniques for informationretrieval are gathering of documents for relevance assessment anduse of the assessments to numerically evaluate effectiveness. Weconsider both of these stages in the context of the TRECexperiments, to determine whether they lead to measurements thatare trustworthy and fair. Our detailed empirical investigation ofthe TREC results shows that the measured relative performance ofsystems appears to be reliable, but that recall is overestimated:it is likely that many relevant documents have not been found. Wepropose a new pooling strategy that can significantly in- creasethe number of relevant documents found for given effort, withoutcompromising fairness.