Evaluation measures for preference judgments

Authors:
Ben Carterette;Paul N. Bennett
Affiliations:
University of Massachusetts Amherst, Amherst, MA, USA;Microsoft Research, Redmond, WA, USA
Venue:
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2008

Citing 1
Cited 10

Here or there: preference judgments for relevance

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval

Thumbs-Up: a game for playing to rank search results

Proceedings of the ACM SIGKDD Workshop on Human Computation
Building a framework for the probability ranking principle by a family of expected weighted rank

ACM Transactions on Information Systems (TOIS)
Ranking from pairs and triplets: information quality, evaluation methods and query complexity

Proceedings of the fourth ACM international conference on Web search and data mining
System effectiveness, user models, and user utility: a conceptual framework for investigation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Comparisons Instead of Ratings: Towards More Stable Preferences

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
A novel mobile recommender system for indoor shopping

Expert Systems with Applications: An International Journal
Top-k learning to rank: labeling, ranking and evaluation

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
A mutual information-based framework for the analysis of information retrieval systems

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
SRbench--a benchmark for soundtrack recommendation systems

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Is top-k sufficient for ranking?

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has been recent interest in collecting user or assessor preferences, rather than absolute judgments of relevance, for the evaluation or learning of ranking algorithms. Since measures like precision, recall, and DCG are defined over absolute judgments, evaluation over preferences will require new evaluation measures that explicitly model them. We describe a class of such measures and compare absolute and preference measures over a large TREC collection.