Evaluating aggregated search using interleaving

Authors:
Aleksandr Chuklin;Anne Schuth;Katja Hofmann;Pavel Serdyukov;Maarten de Rijke
Affiliations:
Yandex & ISLA, University of Amsterdam, Moscow, Russian Fed.;ISLA, University of Amsterdam, Amsterdam, Netherlands;ISLA, University of Amsterdam, Amsterdam, Netherlands;Yandex, Moscow, Russian Fed.;ISLA, University of Amsterdam, Amsterdam, Netherlands
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 20
Cited 2

Optimizing search by showing results in context

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Accurately interpreting clickthrough data as implicit feedback

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Novelty and diversity in information retrieval evaluation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
How does clickthrough data reflect retrieval quality?

Proceedings of the 17th ACM conference on Information and knowledge management
Diversifying search results

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
Evaluation of methods for relative comparison of retrieval systems based on clickthroughs

Proceedings of the 18th ACM conference on Information and knowledge management
On composition of a federated web search result page: using online users to provide pairwise preference for heterogeneous verticals

Proceedings of the fourth ACM international conference on Web search and data mining
A methodology for evaluating aggregated search results

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Smoothing click counts for aggregated vertical search

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
A probabilistic method for inferring preferences from clicks

Proceedings of the 20th ACM international conference on Information and knowledge management
Large-scale validation and analysis of interleaved search evaluation

ACM Transactions on Information Systems (TOIS)
Beyond ten blue links: enabling user click modeling in federated web search

Proceedings of the fifth ACM international conference on Web search and data mining
Evaluating aggregated search pages

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Optimized interleaving for online retrieval evaluation

Proceedings of the sixth ACM international conference on Web search and data mining
Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval

Information Retrieval
Using intent information to model user behavior in diversified search

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Lerot: an online learning to rank framework

Proceedings of the 2013 workshop on Living labs for information retrieval evaluation
Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods

ACM Transactions on Information Systems (TOIS)

Lerot: an online learning to rank framework

Proceedings of the 2013 workshop on Living labs for information retrieval evaluation
Relative confidence sampling for efficient on-line ranker evaluation

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

A result page of a modern web search engine is often much more complicated than a simple list of "ten blue links." In particular, a search engine may combine results from different sources (e.g., Web, News, and Images), and display these as grouped results to provide a better user experience. Such a system is called an aggregated or federated search system. Because search engines evolve over time, their results need to be constantly evaluated. However, one of the most efficient and widely used evaluation methods, interleaving, cannot be directly applied to aggregated search systems, as it ignores the need to group results originating from the same source (vertical results). We propose an interleaving algorithm that allows comparisons of search engine result pages containing grouped vertical documents. We compare our algorithm to existing interleaving algorithms and other evaluation methods (such as A/B-testing), both on real-life click log data and in simulation experiments. We find that our algorithm allows us to perform unbiased and accurate interleaved comparisons that are comparable to conventional evaluation techniques. We also show that our interleaving algorithm produces a ranking that does not substantially alter the user experience, while being sensitive to changes in both the vertical result block and the non-vertical document rankings. All this makes our proposed interleaving algorithm an essential tool for comparing IR systems with complex aggregated pages.