The use of MMR, diversity-based reranking for reordering documents and producing summaries
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Beyond independent relevance: methods and evaluation metrics for subtopic retrieval
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation metrics based on the bootstrap
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Novelty and diversity in information retrieval evaluation
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Rank-biased precision for measurement of retrieval effectiveness
ACM Transactions on Information Systems (TOIS)
How does clickthrough data reflect retrieval quality?
Proceedings of the 17th ACM conference on Information and knowledge management
Proceedings of the Second ACM International Conference on Web Search and Data Mining
An Effectiveness Measure for Ambiguous and Underspecified Queries
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Expected reciprocal rank for graded relevance
Proceedings of the 18th ACM conference on Information and knowledge management
Diversifying web search results
Proceedings of the 19th international conference on World wide web
Exploiting query reformulations for web search result diversification
Proceedings of the 19th international conference on World wide web
Here or there: preference judgments for relevance
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Do user preferences and evaluation measures line up?
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A comparative analysis of cascade measures for novelty and diversity
Proceedings of the fourth ACM international conference on Web search and data mining
An analysis of NP-completeness in novelty and diversity ranking
Information Retrieval
System effectiveness, user models, and user utility: a conceptual framework for investigation
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Evaluating diversified search results using per-intent graded relevance
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A query-basis approach to parametrizing novelty-biased cumulative gain
ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Learning to aggregate vertical results into web search results
Proceedings of the 20th ACM international conference on Information and knowledge management
Intent-based diversification of web search results: metrics and algorithms
Information Retrieval
Using preference judgments for novel document retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Coverage-based search result diversification
Information Retrieval
Contextual and dimensional relevance judgments for reusable SERP-level evaluation
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
Novel and diverse document ranking is an effective strategy that involves reducing redundancy in a ranked list to maximize the amount of novel and relevant information available to users. Evaluation for novelty and diversity typically involves an assessor judging each document for relevance against a set of pre-identified subtopics, which may be disambiguations of the query, facets of an information need, or nuggets of information. Alternately, when expressing a \emph{preference} for document A or document B, users may implicitly take subtopics into account, but may also take into account other factors such as recency, readability, length, and so on, each of which may have more or less importance depending on user. A \emph{user profile} contains information about the extent to which each factor, including subtopic relevance, plays a role in the user's preference for one document over another. A preference-based evaluation can then take this user profile information into account to better model utility to the space of users. In this work, we propose an evaluation framework that not only can consider implicit factors but also handles differences in user preference due to varying underlying information need. Our proposed framework is based on the idea that a user scanning a ranked list from top to bottom and stopping at rank $k$ gains some utility from every document that is relevant their information need. Thus, we model the expected utility of a ranked list by estimating the utility of a document at a given rank using preference judgments and define evaluation measures based on the same. We validate our framework by comparing it to existing measures such as $\alpha$-nDCG, ERR-IA, and subtopic recall that require explicit subtopic judgments We show that our proposed measures correlate well with existing measures while having the potential to capture various other factors when real data is used. We also show that the proposed measures can easily handle relevance assessments against multiple user profiles, and that they are robust to noisy and incomplete judgments.