Ranking retrieval systems without relevance judgments

Authors:
Ian Soboroff;Charles Nicholas;Patrick Cahan
Affiliations:
Univ. of Maryland, Baltimore County;Univ. of Maryland, Baltimore County;Univ. of Maryland, Baltimore County
Venue:
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2001

Citing 10
Cited 75

Automatic combination of multiple ranked retrieval systems

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation of evaluation in information retrieval

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance assessments and the measurement of retrieval effectiveness

Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval

21st Annual ACM/SIGIR International Conference on Research and Development in Information Retrieval
Efficient construction of large test collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
How reliable are the results of large-scale information retrieval experiments?

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Information science

Journal of the American Society for Information Science - Special issue on the 50th anniversary of the Journal of The American Society for Information Science: part 2: paradigms, models and methods of information science
Variations in relevance judgments and the measurement of retrieval effectiveness

Information Processing and Management: an International Journal
Regions and levels: measuring and mapping users' relevance judgments

Journal of the American Society for Information Science and Technology

Data fusion with estimated weights

Proceedings of the eleventh international conference on Information and knowledge management
On the effectiveness of evaluating retrieval systems in the absence of relevance judgments

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Automatic ranking of retrieval systems in imperfect environments

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Methods for ranking information retrieval systems without relevance judgments

Proceedings of the 2003 ACM symposium on Applied computing
Disproving the fusion hypothesis: an analysis of data fusion via effective information retrieval strategies

Proceedings of the 2003 ACM symposium on Applied computing
Automatic performance evaluation of web search engines

Information Processing and Management: an International Journal
Scaling IR-system evaluation using term relevance sets

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Forming test collections with no system pooling

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Fusion of effective retrieval strategies in the same information retrieval system

Journal of the American Society for Information Science and Technology
Incremental test collections

Proceedings of the 14th ACM international conference on Information and knowledge management
Automatic ranking of information retrieval systems using data fusion

Information Processing and Management: an International Journal
Performance prediction of data fusion for information retrieval

Information Processing and Management: an International Journal
The effectiveness of web search engines for retrieving relevant ecommerce links

Information Processing and Management: an International Journal
Minimal test collections for retrieval evaluation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Methods for comparing rankings of search engine results

Computer Networks: The International Journal of Computer and Telecommunications Networking - Web dynamics
Improving high accuracy retrieval by eliminating the uneven correlation effect in data fusion

Journal of the American Society for Information Science and Technology
Estimating average precision with incomplete and imperfect judgments

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A machine learning based approach to evaluating retrieval systems

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Using the structure of overlap between search results to rank retrieval systems without relevance judgments

Information Processing and Management: an International Journal
Alternatives to Bpref

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Hits hits TREC: exploring IR evaluation results with network analysis

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Performance prediction using spatial autocorrelation

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Repeatable evaluation of search services in dynamic environments

ACM Transactions on Information Systems (TOIS)
Evaluating epistemic uncertainty under incomplete assessments

Information Processing and Management: an International Journal
Mining the search trails of surfing crowds: identifying relevant websites from user activity

Proceedings of the 17th international conference on World Wide Web
On information retrieval metrics designed for evaluation with incomplete relevance assessments

Information Retrieval
A new rank correlation coefficient for information retrieval

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Combining structure and function-based descriptors for component retrieval in software digital libraries

Integrated Computer-Aided Engineering
How does clickthrough data reflect retrieval quality?

Proceedings of the 17th ACM conference on Information and knowledge management
Comparative analysis of clicks and judgments for IR evaluation

Proceedings of the 2009 workshop on Web Search Click Data
Using Multiple Query Aspects to Build Test Collections without Human Relevance Judgments

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Assigning appropriate weights for the linear combination data fusion method in information retrieval

Information Processing and Management: an International Journal
Generative model-based metasearch for data fusion in information retrieval

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
PSkip: estimating relevance ranking quality from web search clickthrough data

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Using argumentation to retrieve articles with similar citations from MEDLINE

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
On rank correlation and the distance between rankings

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Towards methods for the collective gathering and quality control of relevance assessments

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
From "Identical" to "Similar": Fusing Retrieved Lists Based on Inter-document Similarities

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Relying on topic subsets for system ranking estimation

Proceedings of the 18th ACM conference on Information and knowledge management
Automatic Search Engine Performance Evaluation with the Wisdom of Crowds

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
A coherent measurement of web-search relevance

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
A retrieval evaluation methodology for incomplete relevance assessments

ECIR'07 Proceedings of the 29th European conference on IR research
The effect of assessor error on IR system evaluation

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Retrieval system evaluation: automatic evaluation versus incomplete judgments

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Aspects and analysis of patent test collections

PaIR '10 Proceedings of the 3rd international workshop on Patent information retrieval
Research methodology in studies of assessor effort for information retrieval evaluation

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Boiling down information retrieval test collections

RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Using clustering to improve retrieval evaluation without relevance judgments

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Diagnostic Evaluation of Information Retrieval Models

ACM Transactions on Information Systems (TOIS)
AutoEval: an evaluation methodology for evaluating query suggestions using query logs

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Cluster-based fusion of retrieved lists

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Pseudo test collections for learning web search ranking functions

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Re-ranking search results using an additional retrieved list

Information Retrieval
Exploring ant colony optimisation for adaptive interactive search

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
From "identical" to "similar": fusing retrieved lists based on inter-document similarities

Journal of Artificial Intelligence Research
An overview of Web search evaluation methods

Computers and Electrical Engineering
Large-scale validation and analysis of interleaved search evaluation

ACM Transactions on Information Systems (TOIS)
Evaluating score normalization methods in data fusion

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
IR system evaluation using nugget-based test collections

Proceedings of the fifth ACM international conference on Web search and data mining
A case for automatic system evaluation

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Scalability influence on retrieval models: an experimental methodology

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Automated functional testing of online search services

Software Testing, Verification & Reliability
An uncertainty-aware query selection model for evaluation of IR systems

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Using crowdsourcing for TREC relevance assessment

Information Processing and Management: an International Journal
Overview of WebCLEF 2006

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Alternative assessor disagreement and retrieval depth

Proceedings of the 21st ACM international conference on Information and knowledge management
Constructing test collections by inferring document relevance via extracted relevant information

Proceedings of the 21st ACM international conference on Information and knowledge management
Predicting query performance for fusion-based retrieval

Proceedings of the 21st ACM international conference on Information and knowledge management
Increasing cheat robustness of crowdsourcing tasks

Information Retrieval
Automatically assessing machine summary content without a gold standard

Computational Linguistics
The impact of intent selection on diversified search evaluation

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A new statistical strategy for pooling: ELI

Information Processing Letters
Evaluation in Music Information Retrieval

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The most prevalent experimental methodology for comparing the effectiveness of information retrieval systems requires a test collection, composed of a set of documents, a set of query topics, and a set of relevance judgments indicating which documents are relevant to which topics. It is well known that relevance judgments are not infallible, but recent retrospective investigation into results from the Text REtrieval Conference (TREC) has shown that differences in human judgments of relevance do not affect the relative measured performance of retrieval systems. Based on this result, we propose and describe the initial results of a new evaluation methodology which replaces human relevance judgments with a randomly selected mapping of documents to topics which we refer to aspseudo-relevance judgments.Rankings of systems with our methodology correlate positively with official TREC rankings, although the performance of the top systems is not predicted well. The correlations are stable over a variety of pool depths and sampling techniques. With improvements, such a methodology could be useful in evaluating systems such as World-Wide Web search engines, where the set of documents changes too often to make traditional collection construction techniques practical.