Determining the effectiveness of retrieval algorithms
Information Processing and Management: an International Journal
Variations in relevance judgments and the measurement of retrieval effectiveness
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
IR evaluation methods for retrieving highly relevant documents
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Accurately interpreting clickthrough data as implicit feedback
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to rank using gradient descent
ICML '05 Proceedings of the 22nd international conference on Machine learning
Minimal test collections for retrieval evaluation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Learning a ranking from pairwise preferences
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Measuring the agreement among relevance judges
MIRA'99 Proceedings of the 1999 international conference on Final Mira
Evaluation measures for preference judgments
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
How does clickthrough data reflect retrieval quality?
Proceedings of the 17th ACM conference on Information and knowledge management
Suppressing outliers in pairwise preference ranking
Proceedings of the 17th ACM conference on Information and knowledge management
Learning consensus opinion: mining data from a labeling game
Proceedings of the 18th international conference on World wide web
Thumbs-up: a game for playing to rank search results
Proceedings of the 18th international conference on World wide web
Interactively optimizing information retrieval systems as a dueling bandits problem
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Decision-theoretic user interface generation
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Evaluation of methods for relative comparison of retrieval systems based on clickthroughs
Proceedings of the 18th ACM conference on Information and knowledge management
Learning term-weighting functions for similarity measures
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Empirical exploitation of click data for task specific ranking
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
PRES: a score metric for evaluating recall-oriented information retrieval applications
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Evaluating search systems using result page context
Proceedings of the third symposium on Information interaction in context
Select-the-Best-Ones: A new way to judge relative relevance
Information Processing and Management: an International Journal
Ranking from pairs and triplets: information quality, evaluation methods and query complexity
Proceedings of the fourth ACM international conference on Web search and data mining
Detecting duplicate web documents using clickthrough data
Proceedings of the fourth ACM international conference on Web search and data mining
Evaluating search engines by clickthrough data
ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part II
A methodology for evaluating aggregated search results
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
System effectiveness, user models, and user utility: a conceptual framework for investigation
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
The effects of choice in routing relevance judgments
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Comparisons Instead of Ratings: Towards More Stable Preferences
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Semi-supervised learning to rank with preference regularization
Proceedings of the 20th ACM international conference on Information and knowledge management
Large-scale validation and analysis of interleaved search evaluation
ACM Transactions on Information Systems (TOIS)
The Journal of Machine Learning Research
Top-k learning to rank: labeling, ranking and evaluation
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Using preference judgments for novel document retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Impact of assessor disagreement on ranking performance
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Using PageRank to infer user preferences
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Using crowdsourcing for TREC relevance assessment
Information Processing and Management: an International Journal
Synthesis ranking with critic resonance
Proceedings of the 3rd Annual ACM Web Science Conference
RankPref: ranking sentences describing relations between biomedical entities with an application
BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Pairwise ranking aggregation in a crowdsourced setting
Proceedings of the sixth ACM international conference on Web search and data mining
Practical online retrieval evaluation
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
A mutual information-based framework for the analysis of information retrieval systems
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Preference based evaluation measures for novelty and diversity
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Relevance dimensions in preference-based IR evaluation
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
SRbench--a benchmark for soundtrack recommendation systems
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Is top-k sufficient for ranking?
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
User intent and assessor disagreement in web search evaluation
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Information retrieval systems have traditionally been evaluated over absolute judgments of relevance: each document is judged for relevance on its own, independent of other documents that may be on topic. We hypothesize that preference judgments of the form "document A is more relevant than document B" are easier for assessors to make than absolute judgments, and provide evidence for our hypothesis through a study with assessors. We then investigate methods to evaluate search engines using preference judgments. Furthermore, we show that by using inferences and clever selection of pairs to judge, we need not compare all pairs of documents in order to apply evaluation methods.