Learning to distribute queries into web search nodes

  • Authors:
  • Marcelo Mendoza;Mauricio Marín;Flavio Ferrarotti;Barbara Poblete

  • Affiliations:
  • Yahoo! Research Latin America, Santiago, Chile;Yahoo! Research Latin America, Santiago, Chile;Yahoo! Research Latin America, Santiago, Chile;Yahoo! Research Latin America, Santiago, Chile

  • Venue:
  • ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web search engines are composed of a large set of search nodes and a broker machine that feeds them with queries. A location cache keeps minimal information in the broker to register the search nodes capable of producing the top-N results for frequent queries. In this paper we show that it is possible to use the location cache as a training dataset for a standard machine learning algorithm and build a predictive model of the search nodes expected to produce the best approximated results for queries. This can be used to prevent the broker from sending queries to all search nodes under situations of sudden peaks in query traffic and, as a result, avoid search node saturation. This paper proposes a logistic regression model to quickly predict the most pertinent search nodes for a given query.