On the reliability and intuitiveness of aggregated search metrics

Authors:
Ke Zhou;Mounia Lalmas;Tetsuya Sakai;Ronan Cummins;Joemon M. Jose
Affiliations:
University of Glasgow, Glasgow, United Kingdom;Yahoo! Labs, Barcelona, Spain;Waseda University, Tokyo, Japan;University of Greenwich, London, United Kingdom;University of Glasgow, Glasgow, United Kingdom
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 28
Cited 0

Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Beyond independent relevance: methods and evaluation metrics for subtopic retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Relevant document distribution estimation method for resource selection

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Evaluating evaluation metrics based on the bootstrap

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Learning query intent from regularized click graphs

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Novelty and diversity in information retrieval evaluation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Rank-biased precision for measurement of retrieval effectiveness

ACM Transactions on Information Systems (TOIS)
Diversifying search results

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Sources of evidence for vertical selection

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
Central-rank-based collection selection in uncooperative distributed information retrieval

ECIR'07 Proceedings of the 29th European conference on IR research
Do user preferences and evaluation measures line up?

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Evaluating whole-page relevance

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A comparative analysis of cascade measures for novelty and diversity

Proceedings of the fourth ACM international conference on Web search and data mining
On composition of a federated web search result page: using online users to provide pairwise preference for heterogeneous verticals

Proceedings of the fourth ACM international conference on Web search and data mining
A methodology for evaluating aggregated search results

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Evaluating diversified search results using per-intent graded relevance

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Aggregated search result diversification

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Learning to aggregate vertical results into web search results

Proceedings of the 20th ACM international conference on Information and knowledge management
Evaluating large-scale distributed vertical search

Proceedings of the 9th workshop on Large-scale and distributed informational retrieval
Multiple testing in statistical analysis of systems-based information retrieval experiments

ACM Transactions on Information Systems (TOIS)
Evaluation with informational and navigational intents

Proceedings of the 21st international conference on World Wide Web
Assessing and predicting vertical intent for web queries

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Evaluating aggregated search pages

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Federated search in the wild: the combined power of over a hundred search engines

Proceedings of the 21st ACM international conference on Information and knowledge management
Evaluating reward and risk for vertical selection

Proceedings of the 21st ACM international conference on Information and knowledge management
Diversified search evaluation: lessons from the NTCIR-9 INTENT task

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Aggregating search results from a variety of diverse verticals such as news, images, videos and Wikipedia into a single interface is a popular web search presentation paradigm. Although several aggregated search (AS) metrics have been proposed to evaluate AS result pages, their properties remain poorly understood. In this paper, we compare the properties of existing AS metrics under the assumptions that (1) queries may have multiple preferred verticals; (2) the likelihood of each vertical preference is available; and (3) the topical relevance assessments of results returned from each vertical is available. We compare a wide range of AS metrics on two test collections. Our main criteria of comparison are (1) discriminative power, which represents the reliability of a metric in comparing the performance of systems, and (2) intuitiveness, which represents how well a metric captures the various key aspects to be measured (i.e. various aspects of a user's perception of AS result pages). Our study shows that the AS metrics that capture key AS components (e.g., vertical selection) have several advantages over other metrics. This work sheds new lights on the further developments and applications of AS metrics.