Query sampling for ranking learning in web search

  • Authors:
  • Linjun Yang;Li Wang;Bo Geng;Xian-Sheng Hua

  • Affiliations:
  • Microsoft Research Asia, Beijing, China;University of Science and Technology of China, Hefei, China;Peking University, Beijing, China;Microsoft Research Asia, Beijing, China

  • Venue:
  • Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Learning to rank has become a popular approach to build a ranking model for Web search recently. Based on our observation, the constitution of the training set will greatly influence the performance of the learned ranking model. Meanwhile, the number of queries in Web search is nearly infinite and the human labeling cost is expensive, hence a subset of queries need to be carefully selected for training. In this paper, we develop a greedy algorithm to sample the queries, by simultaneously taking the query density, difficulty and diversity into consideration. The experimental results on a collected Web search dataset comprising 2024 queries show that the proposed method can lead to a more informative training set for building an effective model.