A case for improved evaluation of query difficulty prediction

Authors:
Falk Scholer;Steven Garcia
Affiliations:
RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia
Venue:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Year:
2009

Citing 6
Cited 5

Predicting query performance

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Query association surrogates for Web search: Research Articles

Journal of the American Society for Information Science and Technology
Query performance prediction

Information Systems
A survey of pre-retrieval query performance predictors

Proceedings of the 17th ACM conference on Information and knowledge management
Handbook of Parametric and Nonparametric Statistical Procedures

Handbook of Parametric and Nonparametric Statistical Procedures
Effective pre-retrieval query performance prediction using similarity and variability evidence

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval

An image retrieval approach to setup difficulty levels in training systems for endomicroscopy diagnosis

MICCAI'10 Proceedings of the 13th international conference on Medical image computing and computer-assisted intervention: Part II
Candidate document retrieval for web-scale text reuse detection

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Predicting Query Performance by Query-Drift Estimation

ACM Transactions on Information Systems (TOIS)
Back to the roots: a probabilistic framework for query-performance prediction

Proceedings of the 21st ACM international conference on Information and knowledge management
A learning approach to optimizing exploration---exploitation tradeoff in relevance feedback

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Query difficulty prediction aims to identify, in advance, how well an information retrieval system will perform when faced with a particular search request. The current standard evaluation methodology involves calculating a correlation coefficient, to indicate how strongly the predicted query difficulty is related with an actual system performance measure, usually Average Precision. We run a series of experiments based on predictors that have been shown to perform well in the literature, comparing these across different TREC runs. Our results demonstrate that the current evaluation methodology is severely limited. Although it can be used to demonstrate the performance of a predictor for a single system, such performance is not consistent over a variety of retrieval systems. We conclude that published results in the query difficulty area are generally not comparable, and recommend that prediction be evaluated against a spectrum of underlying search systems.