Learning to distribute queries into web search nodes

Authors:
Marcelo Mendoza;Mauricio Marín;Flavio Ferrarotti;Barbara Poblete
Affiliations:
Yahoo! Research Latin America, Santiago, Chile;Yahoo! Research Latin America, Santiago, Chile;Yahoo! Research Latin America, Santiago, Chile;Yahoo! Research Latin America, Santiago, Chile
Venue:
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Year:
2010

Citing 18
Cited 2

Semantic cache mechanism for heterogeneous Web querying

WWW '99 Proceedings of the eighth international conference on World Wide Web
Answering Queries by Semantic Caches

DEXA '99 Proceedings of the 10th International Conference on Database and Expert Systems Applications
Semantic caching of Web queries

The VLDB Journal — The International Journal on Very Large Data Bases
Predictive caching and prefetching of query results in search engines

WWW '03 Proceedings of the 12th international conference on World Wide Web
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Three-level caching for efficient query processing in large Web search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data

ACM Transactions on Information Systems (TOIS)
Load-balancing and caching for collection selection architectures

Proceedings of the 2nd international conference on Scalable information systems
Trust Region Newton Method for Logistic Regression

The Journal of Machine Learning Research
A sequential dual method for large scale multi-class linear svms

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Design trade-offs for search engine caching

ACM Transactions on the Web (TWEB)
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
A metric cache for similarity search

Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Inverted index compression and query processing with optimized document ordering

Proceedings of the 18th international conference on World wide web
Improved techniques for result caching in web search engines

Proceedings of the 18th international conference on World wide web
A Last-Resort Semantic Cache for Web Queries

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Location cache for web queries

Proceedings of the 18th ACM conference on Information and knowledge management
On caching search engine query results

Computer Communications

New caching techniques for web search engines

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Recent developments in information retrieval

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web search engines are composed of a large set of search nodes and a broker machine that feeds them with queries. A location cache keeps minimal information in the broker to register the search nodes capable of producing the top-N results for frequent queries. In this paper we show that it is possible to use the location cache as a training dataset for a standard machine learning algorithm and build a predictive model of the search nodes expected to produce the best approximated results for queries. This can be used to prevent the broker from sending queries to all search nodes under situations of sudden peaks in query traffic and, as a result, avoid search node saturation. This paper proposes a logistic regression model to quickly predict the most pertinent search nodes for a given query.