Ranking specialization for web search: a divide-and-conquer approach by using topical RankSVM

  • Authors:
  • Jiang Bian;Xin Li;Fan Li;Zhaohui Zheng;Hongyuan Zha

  • Affiliations:
  • Georgia Institute of Technology, Atlanta, GA, USA;Yahoo! Labs, Sunnyvale, CA, USA;Yahoo! Labs, Sunnyvale, CA, USA;Yahoo! Labs, Sunnyvale, CA, USA;Georgia Institute of Technology, Atlanta, GA, USA

  • Venue:
  • Proceedings of the 19th international conference on World wide web
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many ranking algorithms applying machine learning techniques have been proposed in informational retrieval and Web search. However, most of existing approaches do not explicitly take into account the fact that queries vary significantly in terms of ranking and entail different treatments regarding the ranking models. In this paper, we apply a divide-and-conquer framework for ranking specialization, i.e. learning multiple ranking models by addressing query difference. We first generate query representation by aggregating ranking features through pseudo feedbacks, and employ unsupervised clustering methods to identify a set of ranking-sensitive query topics based on training queries. To learn multiple ranking models for respective ranking-sensitive query topics, we define a global loss function by combining the ranking risks of all query topics, and we propose a unified SVM-based learning process to minimize the global loss. Moreover, we employ an ensemble approach to generate the ranking result for each test query by applying a set of ranking models of the most appropriate query topics. We conduct experiments using a benchmark dataset for learning ranking functions as well as a dataset from a commercial search engine. Experimental results show that our proposed approach can significantly improve the ranking performance over existing single-model approaches as well as straightforward local ranking approaches, and the automatically identified ranking-sensitive topics are more useful for enhancing ranking performance than pre-defined query categorization.