Ranking from pairs and triplets: information quality, evaluation methods and query complexity

Authors:
Kira Radinsky;Nir Ailon
Affiliations:
Technion, Haifa, Israel;Technion, Haifa, Israel
Venue:
Proceedings of the fourth ACM international conference on Web search and data mining
Year:
2011

Citing 23
Cited 3

The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Learning to order things

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Retrieval evaluation with incomplete information

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
How to rank with few errors

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
AdaRank: a boosting algorithm for information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance: A review of the literature and a framework for thinking on the notion in information science. Part III: Behavior and effects of relevance

Journal of the American Society for Information Science and Technology
Noisy sorting without resampling

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Crowdsourcing user studies with Mechanical Turk

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Robust reductions from ranking to classification

Machine Learning
Listwise approach to learning to rank: theory and algorithm

Proceedings of the 25th international conference on Machine learning
Evaluation measures for preference judgments

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Label ranking by learning pairwise preferences

Artificial Intelligence
Rank-biased precision for measurement of retrieval effectiveness

ACM Transactions on Information Systems (TOIS)
A Simple Linear Ranking Algorithm Using Query Dependent Intercept Variables

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Generalization analysis of listwise learning-to-rank algorithms

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
When more is less: the paradox of choice in search engine use

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Improved bounds for computing Kemeny rankings

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
Ranking mechanisms in twitter-like forums

Proceedings of the third ACM international conference on Web search and data mining
The P-Norm Push: A Simple Convex Ranking Algorithm that Concentrates at the Top of the List

The Journal of Machine Learning Research
Here or there: preference judgments for relevance

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval

An active learning algorithm for ranking from pairwise preferences with an almost optimal query complexity

The Journal of Machine Learning Research
Top-k learning to rank: labeling, ranking and evaluation

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Pairwise learning in recommendation: experiments with community recommendation on linkedin

Proceedings of the 7th ACM conference on Recommender systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Obtaining judgments from human raters is a vital part in the design of search engines' evaluation. Today, a discrepancy exists between judgment acquisition from raters (training phase) and use of the responses for retrieval evaluation (evaluation phase). This discrepancy is due to the inconsistency between the representation of the information in both phases. During training, raters are requested to provide a relevance score for an individual result in the context of a query, whereas the evaluation is performed on ordered lists of search results, with the results' relative position (compared to other results) taken into account. As an alternative to the practice of learning to rank using relevance judgments for individual search results, more and more focus has recently been diverted to the theory and practice of learning from answers to combinatorial questions about sets of search results. That is, users, during training, are asked to rank small sets (typically pairs). Human rater responses to questions about the relevance of individual results are first compared to their responses to questions about the relevance of pairs of results. We empirically show that neither type of response can be deduced from the other, and that the added context created when results are shown together changes the raters' evaluation process. Since pairwise judgments are directly related to ranking, we conclude they are more accurate for that purpose. We go beyond pairs to show that triplets do not contain significantly more information than pairs for the purpose of measuring statistical preference. These two results establish good stability properties of pairwise comparisons for the purpose of learning to rank. We further analyze different scenarios, in which results of varying quality are added as "decoys". A recurring source of worry in papers focusing on pairwise comparison is the quadratic number of pairs in a set of results. Which preferences do we choose to solicit from paid raters? Can we provably eliminate a quadratic cost? We employ results from statistical learning theory to show that the quadratic cost can be provably eliminated in certain cases. More precisely, we show that in order to obtain a ranking in which each element is an average of O(n/C) positions away from its position in the optimal ranking, one needs to sample O(nC2) pairs uniformly at random, for any C 0. We also present an active learning algorithm which samples the pairs adaptively, and conjecture that it provides additional improvement.