Variations in relevance judgments and the measurement of retrieval effectiveness
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The use of MMR, diversity-based reranking for reordering documents and producing summaries
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Beyond independent relevance: methods and evaluation metrics for subtopic retrieval
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Improving web search results using affinity graph
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Less is more: probabilistic models for retrieving fewer relevant documents
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation metrics based on the bootstrap
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
On GMAP: and other transformations
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
An experimental comparison of click position-bias models
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Learning diverse rankings with multi-armed bandits
Proceedings of the 25th international conference on Machine learning
Novelty and diversity in information retrieval evaluation
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Rank-biased precision for measurement of retrieval effectiveness
ACM Transactions on Information Systems (TOIS)
Proceedings of the Second ACM International Conference on Web Search and Data Mining
An Effectiveness Measure for Ambiguous and Underspecified Queries
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
An Analysis of NP-Completeness in Novelty and Diversity Ranking
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Expected reciprocal rank for graded relevance
Proceedings of the 18th ACM conference on Information and knowledge management
Click-based evidence for decaying weight distributions in search effectiveness metrics
Information Retrieval
Diversifying web search results
Proceedings of the 19th international conference on World wide web
Expected browsing utility for web search evaluation
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
On the informativeness of cascade and intent-aware effectiveness measures
Proceedings of the 20th international conference on World wide web
Evaluating diversified search results using per-intent graded relevance
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
On the suitability of diversity metrics for learning-to-rank for diversity
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Aggregated search result diversification
ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Do subtopic judgments reflect diversity?
ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Rank and relevance in novelty and diversity metrics for recommender systems
Proceedings of the fifth ACM conference on Recommender systems
Click the search button and be happy: evaluating direct and immediate information access
Proceedings of the 20th ACM international conference on Information and knowledge management
Evaluation with informational and navigational intents
Proceedings of the 21st international conference on World Wide Web
Diversity by proportionality: an election-based approach to search result diversification
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Explicit relevance models in intent-oriented information retrieval diversification
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Time-based calibration of effectiveness measures
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
On judgments obtained from a commercial search engine
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
On the role of novelty for search result diversification
Information Retrieval
Modeling user variance in time-biased gain
Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval
An analysis of systematic judging errors in information retrieval
Proceedings of the 21st ACM international conference on Information and knowledge management
The effect of aggregated search coherence on search behavior
Proceedings of the 21st ACM international conference on Information and knowledge management
Stochastic simulation of time-biased gain
Proceedings of the 21st ACM international conference on Information and knowledge management
Models and metrics: IR evaluation as a user process
Proceedings of the Seventeenth Australasian Document Computing Symposium
Using intent information to model user behavior in diversified search
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Summaries, ranked retrieval and sessions: a unified framework for information access evaluation
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Click model-based information retrieval metrics
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Preference based evaluation measures for novelty and diversity
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Summary of the NTCIR-10 INTENT-2 task: subtopic mining and search result diversification
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
The impact of intent selection on diversified search evaluation
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
On the reliability and intuitiveness of aggregated search metrics
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Augmenting web search surrogates with images
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Topic diversity in tag recommendation
Proceedings of the 7th ACM conference on Recommender systems
Learning to rank query suggestions for adhoc and diversity search
Information Retrieval
Diversified search evaluation: lessons from the NTCIR-9 INTENT task
Information Retrieval
Mining subtopics from text fragments for a web query
Information Retrieval
Increasing evaluation sensitivity to diversity
Information Retrieval
Choices in batch information retrieval evaluation
Proceedings of the 18th Australasian Document Computing Symposium
Hi-index | 0.00 |
Traditional editorial effectiveness measures, such as nDCG, remain standard for Web search evaluation. Unfortunately, these traditional measures can inappropriately reward redundant information and can fail to reflect the broad range of user needs that can underlie a Web query. To address these deficiencies, several researchers have recently proposed effectiveness measures for novelty and diversity. Many of these measures are based on simple cascade models of user behavior, which operate by considering the relationship between successive elements of a result list. The properties of these measures are still poorly understood, and it is not clear from prior research that they work as intended. In this paper we examine the properties and performance of cascade measures with the goal of validating them as tools for measuring effectiveness. We explore their commonalities and differences, placing them in a unified framework; we discuss their theoretical difficulties and limitations, and compare the measures experimentally, contrasting them against traditional measures and against other approaches to measuring novelty. Data collected by the TREC 2009 Web Track is used as the basis for our experimental comparison. Our results indicate that these measures reward systems that achieve an balance between novelty and overall precision in their result lists, as intended. Nonetheless, other measures provide insights not captured by the cascade measures, and we suggest that future evaluation efforts continue to report a variety of measures.