Increasing evaluation sensitivity to diversity

Authors:
Peter B. Golbus;Javed A. Aslam;Charles L. Clarke
Affiliations:
College of Computer and Information Science, Northeastern University, Boston, USA 02115;College of Computer and Information Science, Northeastern University, Boston, USA 02115;School of Computer Science, University of Waterloo, Waterloo, Canada N2L 3G1
Venue:
Information Retrieval
Year:
2013

Citing 32
Cited 2

The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Predicting query performance

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
A taxonomy of web search

ACM SIGIR Forum
Beyond independent relevance: methods and evaluation metrics for subtopic retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
What makes a query difficult?

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
On ranking the effectiveness of searches

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Less is more: probabilistic models for retrieving fewer relevant documents

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation metrics based on the bootstrap

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Precision prediction based on ranked list coherence

Information Retrieval
On GMAP: and other transformations

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Ranking robustness: a novel framework to predict query performance

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Query performance prediction in web search environments

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An experimental comparison of click position-bias models

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Novelty and diversity in information retrieval evaluation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Rank-biased precision for measurement of retrieval effectiveness

ACM Transactions on Information Systems (TOIS)
Diversifying search results

Proceedings of the Second ACM International Conference on Web Search and Data Mining
An Effectiveness Measure for Ambiguous and Underspecified Queries

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
An Analysis of NP-Completeness in Novelty and Diversity Ranking

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Predicting Query Performance by Query-Drift Estimation

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
Click-based evidence for decaying weight distributions in search effectiveness metrics

Information Retrieval
Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions

ECIR'07 Proceedings of the 29th European conference on IR research
Estimating the Query Difficulty for Information Retrieval

Estimating the Query Difficulty for Information Retrieval
Expected browsing utility for web search evaluation

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A comparative analysis of cascade measures for novelty and diversity

Proceedings of the fourth ACM international conference on Web search and data mining
Intent-aware search result diversification

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Evaluating diversified search results using per-intent graded relevance

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Evaluation with informational and navigational intents

Proceedings of the 21st international conference on World Wide Web
Top-k retrieval using facility location analysis

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Explicit relevance models in intent-oriented information retrieval diversification

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
On the role of novelty for search result diversification

Information Retrieval

A mutual information-based framework for the analysis of information retrieval systems

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Contextual and dimensional relevance judgments for reusable SERP-level evaluation

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many queries have multiple interpretations; they are ambiguous or underspecified. This is especially true in the context of Web search. To account for this, much recent research has focused on creating systems that produce diverse ranked lists. In order to validate these systems, several new evaluation measures have been created to quantify diversity. Ideally, diversity evaluation measures would distinguish between systems by the amount of diversity in the ranked lists they produce. Unfortunately, diversity is also a function of the collection over which the system is run and a system's performance at ad-hoc retrieval. A ranked list built from a collection that does not cover multiple subtopics cannot be diversified; neither can a ranked list that contains no relevant documents. To ensure that we are assessing systems by their diversity, we develop (1) a family of evaluation measures that take into account the diversity of the collection and (2) a meta-evaluation measure that explicitly controls for performance. We demonstrate experimentally that our new measures can achieve substantial improvements in sensitivity to diversity without reducing discriminative power.