Here or there: preference judgments for relevance

Authors:
Ben Carterette;Paul N. Bennett;David Maxwell Chickering;Susan T. Dumais
Affiliations:
University of Massachusetts Amherst;Microsoft Research;Microsoft Live Labs;Microsoft Research
Venue:
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Year:
2008

Citing 10
Cited 38

Determining the effectiveness of retrieval algorithms

Information Processing and Management: an International Journal
Variations in relevance judgments and the measurement of retrieval effectiveness

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Accurately interpreting clickthrough data as implicit feedback

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
Minimal test collections for retrieval evaluation

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Learning a ranking from pairwise preferences

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Measuring the agreement among relevance judges

MIRA'99 Proceedings of the 1999 international conference on Final Mira

Evaluation measures for preference judgments

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
How does clickthrough data reflect retrieval quality?

Proceedings of the 17th ACM conference on Information and knowledge management
Suppressing outliers in pairwise preference ranking

Proceedings of the 17th ACM conference on Information and knowledge management
Learning consensus opinion: mining data from a labeling game

Proceedings of the 18th international conference on World wide web
Thumbs-up: a game for playing to rank search results

Proceedings of the 18th international conference on World wide web
Interactively optimizing information retrieval systems as a dueling bandits problem

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Decision-theoretic user interface generation

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Evaluation of methods for relative comparison of retrieval systems based on clickthroughs

Proceedings of the 18th ACM conference on Information and knowledge management
Learning term-weighting functions for similarity measures

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Empirical exploitation of click data for task specific ranking

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
PRES: a score metric for evaluating recall-oriented information retrieval applications

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Evaluating search systems using result page context

Proceedings of the third symposium on Information interaction in context
Select-the-Best-Ones: A new way to judge relative relevance

Information Processing and Management: an International Journal
Ranking from pairs and triplets: information quality, evaluation methods and query complexity

Proceedings of the fourth ACM international conference on Web search and data mining
Detecting duplicate web documents using clickthrough data

Proceedings of the fourth ACM international conference on Web search and data mining
Evaluating search engines by clickthrough data

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part II
A methodology for evaluating aggregated search results

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
System effectiveness, user models, and user utility: a conceptual framework for investigation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
The effects of choice in routing relevance judgments

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Comparisons Instead of Ratings: Towards More Stable Preferences

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Semi-supervised learning to rank with preference regularization

Proceedings of the 20th ACM international conference on Information and knowledge management
Large-scale validation and analysis of interleaved search evaluation

ACM Transactions on Information Systems (TOIS)
An active learning algorithm for ranking from pairwise preferences with an almost optimal query complexity

The Journal of Machine Learning Research
Top-k learning to rank: labeling, ranking and evaluation

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Using preference judgments for novel document retrieval

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Impact of assessor disagreement on ranking performance

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Using PageRank to infer user preferences

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Using crowdsourcing for TREC relevance assessment

Information Processing and Management: an International Journal
Synthesis ranking with critic resonance

Proceedings of the 3rd Annual ACM Web Science Conference
RankPref: ranking sentences describing relations between biomedical entities with an application

BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Pairwise ranking aggregation in a crowdsourced setting

Proceedings of the sixth ACM international conference on Web search and data mining
Practical online retrieval evaluation

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
A mutual information-based framework for the analysis of information retrieval systems

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Preference based evaluation measures for novelty and diversity

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Relevance dimensions in preference-based IR evaluation

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
SRbench--a benchmark for soundtrack recommendation systems

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Is top-k sufficient for ranking?

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
User intent and assessor disagreement in web search evaluation

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information retrieval systems have traditionally been evaluated over absolute judgments of relevance: each document is judged for relevance on its own, independent of other documents that may be on topic. We hypothesize that preference judgments of the form "document A is more relevant than document B" are easier for assessors to make than absolute judgments, and provide evidence for our hypothesis through a study with assessors. We then investigate methods to evaluate search engines using preference judgments. Furthermore, we show that by using inferences and clever selection of pairs to judge, we need not compare all pairs of documents in order to apply evaluation methods.