Automatic ranking of retrieval systems in imperfect environments

Authors:
Rabia Nuray;Fazli Can
Affiliations:
Bilkent University, Bilkent, Ankara, Turkey;Miami University, Oxford, OH
Venue:
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Year:
2003

Citing 2
Cited 7

Ranking retrieval systems without relevance judgments

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic evaluation of world wide web search services

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval

Scaling IR-system evaluation using term relevance sets

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic ranking of information retrieval systems using data fusion

Information Processing and Management: an International Journal
Estimating average precision with incomplete and imperfect judgments

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Preference learning with extreme examples

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Automatic Search Engine Performance Evaluation with the Wisdom of Crowds

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
An overview of Web search evaluation methods

Computers and Electrical Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The empirical investigation of the effectiveness of information retrieval (IR) systems requires a test collection, a set of query topics, and a set of relevance judgments made by human assessors for each query. Previous experiments show that differences in human relevance assessments do not affect the relative performance of retrieval systems. Based on this observation, we propose and evaluate a new approach to replace the human relevance judgments by an automatic method. Ranking of retrieval systems with our methodology correlates positively and significantly with that of human-based evaluations. In the experiments, we assume a Web-like imperfect environment: the indexing information for all documents is available for ranking, but some documents may not be available for retrieval. Such conditions can be due to document deletions or network problems. Our method of simulating imperfect environments can be used for Web search engine assessment and in estimating the effects of network conditions (e.g., network unreliability) on IR system performance.