A case for improved evaluation of query difficulty prediction

  • Authors:
  • Falk Scholer;Steven Garcia

  • Affiliations:
  • RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia

  • Venue:
  • Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Query difficulty prediction aims to identify, in advance, how well an information retrieval system will perform when faced with a particular search request. The current standard evaluation methodology involves calculating a correlation coefficient, to indicate how strongly the predicted query difficulty is related with an actual system performance measure, usually Average Precision. We run a series of experiments based on predictors that have been shown to perform well in the literature, comparing these across different TREC runs. Our results demonstrate that the current evaluation methodology is severely limited. Although it can be used to demonstrate the performance of a predictor for a single system, such performance is not consistent over a variety of retrieval systems. We conclude that published results in the query difficulty area are generally not comparable, and recommend that prediction be evaluated against a spectrum of underlying search systems.