Evaluation with informational and navigational intents

Authors:
Tetsuya Sakai
Affiliations:
Microsoft Research Asia, Beijing, China
Venue:
Proceedings of the 21st international conference on World Wide Web
Year:
2012

Citing 22
Cited 5

Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
A taxonomy of web search

ACM SIGIR Forum
Beyond independent relevance: methods and evaluation metrics for subtopic retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Evaluating evaluation metrics based on the bootstrap

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of statistical significance tests for information retrieval evaluation

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Determining the informational, navigational, and transactional intent of Web queries

Information Processing and Management: an International Journal
A new rank correlation coefficient for information retrieval

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Novelty and diversity in information retrieval evaluation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Diversifying search results

Proceedings of the Second ACM International Conference on Web Search and Data Mining
An Effectiveness Measure for Ambiguous and Underspecified Queries

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
Diversifying web search results

Proceedings of the 19th international conference on World wide web
Do user preferences and evaluation measures line up?

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Extending average precision to graded relevance judgments

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A comparative analysis of cascade measures for novelty and diversity

Proceedings of the fourth ACM international conference on Web search and data mining
Dynamic ranked retrieval

Proceedings of the fourth ACM international conference on Web search and data mining
Multi-dimensional search result diversification

Proceedings of the fourth ACM international conference on Web search and data mining
Intent-aware search result diversification

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Evaluating diversified search results using per-intent graded relevance

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A query-basis approach to parametrizing novelty-biased cumulative gain

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Multiple testing in statistical analysis of systems-based information retrieval experiments

ACM Transactions on Information Systems (TOIS)
Bootstrap-Based comparisons of IR metrics for finding one relevant document

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology

AspecTiles: tile-based visualization of diversified web search results

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Summary of the NTCIR-10 INTENT-2 task: subtopic mining and search result diversification

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
On the reliability and intuitiveness of aggregated search metrics

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Diversified search evaluation: lessons from the NTCIR-9 INTENT task

Information Retrieval
Increasing evaluation sensitivity to diversity

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given an ambiguous or underspecified query, search result diversification aims at accomodating different user intents within a single "entry-point" result page. However, some intents are informational, for which many relevant pages may help, while others are navigational, for which only one web page is required. We propose new evaluation metrics for search result diversification that considers this distinction, as well as a simple method for comparing the intuitiveness of a given pair of metrics quantitatively. Our main experimental findings are: (a) In terms of discriminative power which reflects statistical reliability, the proposed metrics, DIN#-nDCG and P+Q#, are comparable to intent recall and D#-nDCG, and possibly superior to α-nDCG; (b) In terms of preference agreement with intent recall, P+Q# is superior to other diversity metrics and therefore may be the most intuitive as a metric that emphasises diversity; and (c) In terms of preference agreement with effective precision, DIN#-nDCG is superior to other diversity metrics and therefore may be the most intuitive as a metric that emphasises relevance. Moreover, DIN#-nDCG may be the most intuitive as a metric that considers both diversity and relevance. In addition, we demonstrate that the randomised Tukey's Honestly Significant Differences test that takes the entire set of available runs into account is substantially more conservative than the paired bootstrap test that only considers one run pair at a time, and therefore recommend the former approach for significance testing when a set of runs is available for evaluation.