Partitioning sparse matrices with eigenvectors of graphs
SIAM Journal on Matrix Analysis and Applications
A training algorithm for optimal margin classifiers
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
How reliable are the results of large-scale information retrieval experiments?
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Boosting in the limit: maximizing the margin of learned ensembles
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Variations in relevance judgments and the measurement of retrieval effectiveness
Information Processing and Management: an International Journal
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An efficient boosting algorithm for combining preferences
The Journal of Machine Learning Research
Discriminative models for information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to rank using gradient descent
ICML '05 Proceedings of the 22nd international conference on Machine learning
Learning to rank: from pairwise approach to listwise approach
Proceedings of the 24th international conference on Machine learning
Query dependent ranking using K-nearest neighbor
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Relevance assessment: are judges exchangeable and does it matter
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Improving quality of training data for learning to rank using click-through data
Proceedings of the third ACM international conference on Web search and data mining
Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels
The Journal of Machine Learning Research
Clustering-based transduction for learning a ranking model with limited human labels
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
This paper is concerned with the quality of training data in learning to rank for information retrieval. While many data selection techniques have been proposed to improve the quality of training data for classification, the study on the same issue for ranking appears to be insufficient. As pointed out in this paper, it is inappropriate to extend technologies for classification to ranking, and the development of novel technologies is sorely needed. In this paper, we study the development of such technologies. To begin with, we propose the concept of ''pairwise preference consistency'' (PPC) to describe the quality of a training data collection from the ranking point of view. PPC takes into consideration the ordinal relationship between documents as well as the hierarchical structure on queries and documents, which are both unique properties of ranking. Then we select a subset of the original training documents, by maximizing the PPC of the selected subset. We further propose an efficient solution to the maximization problem. Empirical results on the LETOR benchmark datasets and a web search engine dataset show that with the subset of training data selected by our approach, the performance of the learned ranking model can be significantly improved.