Automatic combination of multiple ranked retrieval systems
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation of evaluation in information retrieval
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance assessments and the measurement of retrieval effectiveness
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
21st Annual ACM/SIGIR International Conference on Research and Development in Information Retrieval
Efficient construction of large test collections
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Journal of the American Society for Information Science - Special issue on the 50th anniversary of the Journal of The American Society for Information Science: part 2: paradigms, models and methods of information science
Variations in relevance judgments and the measurement of retrieval effectiveness
Information Processing and Management: an International Journal
Regions and levels: measuring and mapping users' relevance judgments
Journal of the American Society for Information Science and Technology
Data fusion with estimated weights
Proceedings of the eleventh international conference on Information and knowledge management
On the effectiveness of evaluating retrieval systems in the absence of relevance judgments
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Automatic ranking of retrieval systems in imperfect environments
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Methods for ranking information retrieval systems without relevance judgments
Proceedings of the 2003 ACM symposium on Applied computing
Proceedings of the 2003 ACM symposium on Applied computing
Automatic performance evaluation of web search engines
Information Processing and Management: an International Journal
Scaling IR-system evaluation using term relevance sets
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Forming test collections with no system pooling
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Fusion of effective retrieval strategies in the same information retrieval system
Journal of the American Society for Information Science and Technology
Proceedings of the 14th ACM international conference on Information and knowledge management
Automatic ranking of information retrieval systems using data fusion
Information Processing and Management: an International Journal
Performance prediction of data fusion for information retrieval
Information Processing and Management: an International Journal
The effectiveness of web search engines for retrieving relevant ecommerce links
Information Processing and Management: an International Journal
Minimal test collections for retrieval evaluation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Methods for comparing rankings of search engine results
Computer Networks: The International Journal of Computer and Telecommunications Networking - Web dynamics
Improving high accuracy retrieval by eliminating the uneven correlation effect in data fusion
Journal of the American Society for Information Science and Technology
Estimating average precision with incomplete and imperfect judgments
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A machine learning based approach to evaluating retrieval systems
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Information Processing and Management: an International Journal
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Hits hits TREC: exploring IR evaluation results with network analysis
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Performance prediction using spatial autocorrelation
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Repeatable evaluation of search services in dynamic environments
ACM Transactions on Information Systems (TOIS)
Evaluating epistemic uncertainty under incomplete assessments
Information Processing and Management: an International Journal
Mining the search trails of surfing crowds: identifying relevant websites from user activity
Proceedings of the 17th international conference on World Wide Web
A new rank correlation coefficient for information retrieval
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Integrated Computer-Aided Engineering
How does clickthrough data reflect retrieval quality?
Proceedings of the 17th ACM conference on Information and knowledge management
Comparative analysis of clicks and judgments for IR evaluation
Proceedings of the 2009 workshop on Web Search Click Data
Using Multiple Query Aspects to Build Test Collections without Human Relevance Judgments
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Assigning appropriate weights for the linear combination data fusion method in information retrieval
Information Processing and Management: an International Journal
Generative model-based metasearch for data fusion in information retrieval
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
PSkip: estimating relevance ranking quality from web search clickthrough data
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Using argumentation to retrieve articles with similar citations from MEDLINE
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
On rank correlation and the distance between rankings
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Towards methods for the collective gathering and quality control of relevance assessments
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
From "Identical" to "Similar": Fusing Retrieved Lists Based on Inter-document Similarities
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Relying on topic subsets for system ranking estimation
Proceedings of the 18th ACM conference on Information and knowledge management
Automatic Search Engine Performance Evaluation with the Wisdom of Crowds
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
A coherent measurement of web-search relevance
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
A retrieval evaluation methodology for incomplete relevance assessments
ECIR'07 Proceedings of the 29th European conference on IR research
The effect of assessor error on IR system evaluation
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Retrieval system evaluation: automatic evaluation versus incomplete judgments
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Aspects and analysis of patent test collections
PaIR '10 Proceedings of the 3rd international workshop on Patent information retrieval
Research methodology in studies of assessor effort for information retrieval evaluation
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Boiling down information retrieval test collections
RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Using clustering to improve retrieval evaluation without relevance judgments
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Diagnostic Evaluation of Information Retrieval Models
ACM Transactions on Information Systems (TOIS)
AutoEval: an evaluation methodology for evaluating query suggestions using query logs
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Cluster-based fusion of retrieved lists
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Pseudo test collections for learning web search ranking functions
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Re-ranking search results using an additional retrieved list
Information Retrieval
Exploring ant colony optimisation for adaptive interactive search
ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
From "identical" to "similar": fusing retrieved lists based on inter-document similarities
Journal of Artificial Intelligence Research
An overview of Web search evaluation methods
Computers and Electrical Engineering
Large-scale validation and analysis of interleaved search evaluation
ACM Transactions on Information Systems (TOIS)
Evaluating score normalization methods in data fusion
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
IR system evaluation using nugget-based test collections
Proceedings of the fifth ACM international conference on Web search and data mining
A case for automatic system evaluation
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Scalability influence on retrieval models: an experimental methodology
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Automated functional testing of online search services
Software Testing, Verification & Reliability
An uncertainty-aware query selection model for evaluation of IR systems
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Using crowdsourcing for TREC relevance assessment
Information Processing and Management: an International Journal
CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Alternative assessor disagreement and retrieval depth
Proceedings of the 21st ACM international conference on Information and knowledge management
Constructing test collections by inferring document relevance via extracted relevant information
Proceedings of the 21st ACM international conference on Information and knowledge management
Predicting query performance for fusion-based retrieval
Proceedings of the 21st ACM international conference on Information and knowledge management
Increasing cheat robustness of crowdsourcing tasks
Information Retrieval
Automatically assessing machine summary content without a gold standard
Computational Linguistics
The impact of intent selection on diversified search evaluation
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A new statistical strategy for pooling: ELI
Information Processing Letters
Evaluation in Music Information Retrieval
Journal of Intelligent Information Systems
Hi-index | 0.00 |
The most prevalent experimental methodology for comparing the effectiveness of information retrieval systems requires a test collection, composed of a set of documents, a set of query topics, and a set of relevance judgments indicating which documents are relevant to which topics. It is well known that relevance judgments are not infallible, but recent retrospective investigation into results from the Text REtrieval Conference (TREC) has shown that differences in human judgments of relevance do not affect the relative measured performance of retrieval systems. Based on this result, we propose and describe the initial results of a new evaluation methodology which replaces human relevance judgments with a randomly selected mapping of documents to topics which we refer to aspseudo-relevance judgments.Rankings of systems with our methodology correlate positively with official TREC rankings, although the performance of the top systems is not predicted well. The correlations are stable over a variety of pool depths and sampling techniques. With improvements, such a methodology could be useful in evaluating systems such as World-Wide Web search engines, where the set of documents changes too often to make traditional collection construction techniques practical.