Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
ACM SIGIR Forum
Beyond independent relevance: methods and evaluation metrics for subtopic retrieval
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Evaluating evaluation metrics based on the bootstrap
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of statistical significance tests for information retrieval evaluation
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Determining the informational, navigational, and transactional intent of Web queries
Information Processing and Management: an International Journal
A new rank correlation coefficient for information retrieval
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Novelty and diversity in information retrieval evaluation
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the Second ACM International Conference on Web Search and Data Mining
An Effectiveness Measure for Ambiguous and Underspecified Queries
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Expected reciprocal rank for graded relevance
Proceedings of the 18th ACM conference on Information and knowledge management
Diversifying web search results
Proceedings of the 19th international conference on World wide web
Do user preferences and evaluation measures line up?
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Extending average precision to graded relevance judgments
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A comparative analysis of cascade measures for novelty and diversity
Proceedings of the fourth ACM international conference on Web search and data mining
Proceedings of the fourth ACM international conference on Web search and data mining
Multi-dimensional search result diversification
Proceedings of the fourth ACM international conference on Web search and data mining
Intent-aware search result diversification
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Evaluating diversified search results using per-intent graded relevance
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A query-basis approach to parametrizing novelty-biased cumulative gain
ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Multiple testing in statistical analysis of systems-based information retrieval experiments
ACM Transactions on Information Systems (TOIS)
Bootstrap-Based comparisons of IR metrics for finding one relevant document
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
AspecTiles: tile-based visualization of diversified web search results
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Summary of the NTCIR-10 INTENT-2 task: subtopic mining and search result diversification
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
On the reliability and intuitiveness of aggregated search metrics
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Diversified search evaluation: lessons from the NTCIR-9 INTENT task
Information Retrieval
Increasing evaluation sensitivity to diversity
Information Retrieval
Hi-index | 0.00 |
Given an ambiguous or underspecified query, search result diversification aims at accomodating different user intents within a single "entry-point" result page. However, some intents are informational, for which many relevant pages may help, while others are navigational, for which only one web page is required. We propose new evaluation metrics for search result diversification that considers this distinction, as well as a simple method for comparing the intuitiveness of a given pair of metrics quantitatively. Our main experimental findings are: (a) In terms of discriminative power which reflects statistical reliability, the proposed metrics, DIN#-nDCG and P+Q#, are comparable to intent recall and D#-nDCG, and possibly superior to α-nDCG; (b) In terms of preference agreement with intent recall, P+Q# is superior to other diversity metrics and therefore may be the most intuitive as a metric that emphasises diversity; and (c) In terms of preference agreement with effective precision, DIN#-nDCG is superior to other diversity metrics and therefore may be the most intuitive as a metric that emphasises relevance. Moreover, DIN#-nDCG may be the most intuitive as a metric that considers both diversity and relevance. In addition, we demonstrate that the randomised Tukey's Honestly Significant Differences test that takes the entire set of available runs into account is substantially more conservative than the paired bootstrap test that only considers one run pair at a time, and therefore recommend the former approach for significance testing when a set of runs is available for evaluation.