The use of MMR, diversity-based reranking for reordering documents and producing summaries
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Retrieval evaluation with incomplete information
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
AdaRank: a boosting algorithm for information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Journal of the American Society for Information Science and Technology
Noisy sorting without resampling
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Crowdsourcing user studies with Mechanical Turk
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Robust reductions from ranking to classification
Machine Learning
Listwise approach to learning to rank: theory and algorithm
Proceedings of the 25th international conference on Machine learning
Evaluation measures for preference judgments
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Label ranking by learning pairwise preferences
Artificial Intelligence
Rank-biased precision for measurement of retrieval effectiveness
ACM Transactions on Information Systems (TOIS)
A Simple Linear Ranking Algorithm Using Query Dependent Intercept Variables
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Generalization analysis of listwise learning-to-rank algorithms
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
When more is less: the paradox of choice in search engine use
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Improved bounds for computing Kemeny rankings
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Expected reciprocal rank for graded relevance
Proceedings of the 18th ACM conference on Information and knowledge management
Ranking mechanisms in twitter-like forums
Proceedings of the third ACM international conference on Web search and data mining
The P-Norm Push: A Simple Convex Ranking Algorithm that Concentrates at the Top of the List
The Journal of Machine Learning Research
Here or there: preference judgments for relevance
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
The Journal of Machine Learning Research
Top-k learning to rank: labeling, ranking and evaluation
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Pairwise learning in recommendation: experiments with community recommendation on linkedin
Proceedings of the 7th ACM conference on Recommender systems
Hi-index | 0.00 |
Obtaining judgments from human raters is a vital part in the design of search engines' evaluation. Today, a discrepancy exists between judgment acquisition from raters (training phase) and use of the responses for retrieval evaluation (evaluation phase). This discrepancy is due to the inconsistency between the representation of the information in both phases. During training, raters are requested to provide a relevance score for an individual result in the context of a query, whereas the evaluation is performed on ordered lists of search results, with the results' relative position (compared to other results) taken into account. As an alternative to the practice of learning to rank using relevance judgments for individual search results, more and more focus has recently been diverted to the theory and practice of learning from answers to combinatorial questions about sets of search results. That is, users, during training, are asked to rank small sets (typically pairs). Human rater responses to questions about the relevance of individual results are first compared to their responses to questions about the relevance of pairs of results. We empirically show that neither type of response can be deduced from the other, and that the added context created when results are shown together changes the raters' evaluation process. Since pairwise judgments are directly related to ranking, we conclude they are more accurate for that purpose. We go beyond pairs to show that triplets do not contain significantly more information than pairs for the purpose of measuring statistical preference. These two results establish good stability properties of pairwise comparisons for the purpose of learning to rank. We further analyze different scenarios, in which results of varying quality are added as "decoys". A recurring source of worry in papers focusing on pairwise comparison is the quadratic number of pairs in a set of results. Which preferences do we choose to solicit from paid raters? Can we provably eliminate a quadratic cost? We employ results from statistical learning theory to show that the quadratic cost can be provably eliminated in certain cases. More precisely, we show that in order to obtain a ranking in which each element is an average of O(n/C) positions away from its position in the optimal ranking, one needs to sample O(nC2) pairs uniformly at random, for any C 0. We also present an active learning algorithm which samples the pairs adaptively, and conjecture that it provides additional improvement.