User model-based metrics for offline query suggestion evaluation

  • Authors:
  • Eugene Kharitonov;Craig Macdonald;Pavel Serdyukov;Iadh Ounis

  • Affiliations:
  • Yandex & Glasgow University, Moscow, Russian Fed.;Glasgow University, Glasgow, United Kingdom;Yandex, Moscow, Russian Fed.;Glasgow University, Glasgow, United Kingdom

  • Venue:
  • Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Query suggestion or auto-completion mechanisms are widely used by search engines and are increasingly attracting interest from the research community. However, the lack of commonly accepted evaluation methodology and metrics means that it is not possible to compare results and approaches from the literature. Moreover, often the metrics used to evaluate query suggestions tend to be an adaptation from other domains without a proper justification. Hence, it is not necessarily clear if the improvements reported in the literature would result in an actual improvement in the users' experience. Inspired by the cascade user models and state-of-the-art evaluation metrics in the web search domain, we address the query suggestion evaluation, by first studying the users behaviour from a search engine's query log and thereby deriving a new family of user models describing the users interaction with a query suggestion mechanism. Next, assuming a query log-based evaluation approach, we propose two new metrics to evaluate query suggestions, pSaved and eSaved. Both metrics are parameterised by a user model. pSaved is defined as the probability of using the query suggestions while submitting a query. eSaved equates to the expected relative amount of effort (keypresses) a user can avoid due to the deployed query suggestion mechanism. Finally, we experiment with both metrics using four user model instantiations as well as metrics previously used in the literature on a dataset of 6.1M sessions. Our results demonstrate that pSaved and eSaved show the best alignment with the users satisfaction amongst the considered metrics.