Query sampling for ranking learning in web search

Authors:
Linjun Yang;Li Wang;Bo Geng;Xian-Sheng Hua
Affiliations:
Microsoft Research Asia, Beijing, China;University of Science and Technology of China, Hefei, China;Peking University, Beijing, China;Microsoft Research Asia, Beijing, China
Venue:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Year:
2009

Citing 6
Cited 3

Selective Sampling Using the Query by Committee Algorithm

Machine Learning
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Predicting query performance

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Learning to rank: from pairwise approach to listwise approach

Proceedings of the 24th international conference on Machine learning

Active learning for ranking through expected loss optimization

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
The limits of retrieval effectiveness

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Relevant knowledge helps in choosing right teacher: active query selection for ranking adaptation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Learning to rank has become a popular approach to build a ranking model for Web search recently. Based on our observation, the constitution of the training set will greatly influence the performance of the learned ranking model. Meanwhile, the number of queries in Web search is nearly infinite and the human labeling cost is expensive, hence a subset of queries need to be carefully selected for training. In this paper, we develop a greedy algorithm to sample the queries, by simultaneously taking the query density, difficulty and diversity into consideration. The experimental results on a collected Web search dataset comprising 2024 queries show that the proposed method can lead to a more informative training set for building an effective model.