Top-k learning to rank: labeling, ranking and evaluation

Authors:
Shuzi Niu;Jiafeng Guo;Yanyan Lan;Xueqi Cheng
Affiliations:
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Venue:
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Year:
2012

Citing 26
Cited 3

Variations in relevance judgments and the evaluation of retrieval performance

Information Processing and Management: an International Journal
Measuring retrieval effectiveness based on user preference of documents

Journal of the American Society for Information Science
Variations in relevance judgments and the measurement of retrieval effectiveness

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Towards the identification of the optimal number of relevance categories

Journal of the American Society for Information Science
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness

Information Processing and Management: an International Journal
Evaluation by highly relevant documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An efficient boosting algorithm for combining preferences

The Journal of Machine Learning Research
Query chains: learning to rank from implicit feedback

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
Learning to rank: from pairwise approach to listwise approach

Proceedings of the 24th international conference on Machine learning
A support vector method for optimizing average precision

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
AdaRank: a boosting algorithm for information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Ranking the Best Instances

The Journal of Machine Learning Research
Evaluation measures for preference judgments

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Rank-biased precision for measurement of retrieval effectiveness

ACM Transactions on Information Systems (TOIS)
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
Binary and graded relevance in IR evaluations-Comparison of the effects on ranking of IR systems

Information Processing and Management: an International Journal
Evaluating information retrieval system performance based on user preference

Journal of Intelligent Information Systems
Here or there: preference judgments for relevance

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
LETOR: A benchmark collection for research on learning to rank for information retrieval

Information Retrieval
Select-the-Best-Ones: A new way to judge relative relevance

Information Processing and Management: an International Journal
Ranking from pairs and triplets: information quality, evaluation methods and query complexity

Proceedings of the fourth ACM international conference on Web search and data mining
Ranking with a p-norm push

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Subset ranking using regression

COLT'06 Proceedings of the 19th annual conference on Learning Theory

A new probabilistic model for top-k ranking problem

Proceedings of the 21st ACM international conference on Information and knowledge management
A document rating system for preference judgements

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Is top-k sufficient for ranking?

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a novel top-k learning to rank framework, which involves labeling strategy, ranking model and evaluation measure. The motivation comes from the difficulty in obtaining reliable relevance judgments from human assessors when applying learning to rank in real search systems. The traditional absolute relevance judgment method is difficult in both gradation specification and human assessing, resulting in high level of disagreement on judgments. While the pairwise preference judgment, as a good alternative, is often criticized for increasing the complexity of judgment from O(n) to (n log n). Considering the fact that users mainly care about top ranked search results, we propose a novel top-k labeling strategy which adopts the pairwise preference judgment to generate the top k ordering items from n documents (i.e. top-k ground-truth) in a manner similar to that of HeapSort. As a result, the complexity of judgment is reduced to O(n log k). With the top-k ground-truth, traditional ranking models (e.g. pairwise or listwise models) and evaluation measures (e.g. NDCG) no longer fit the data set. Therefore, we introduce a new ranking model, namely FocusedRank, which fully captures the characteristics of the top-k ground-truth. We also extend the widely used evaluation measures NDCG and ERR to be applicable to the top-k ground-truth, referred as κ-NDCG and κ-ERR, respectively. Finally, we conduct extensive experiments on benchmark data collections to demonstrate the efficiency and effectiveness of our top-k labeling strategy and ranking models.