Batch query processing for web search engines

Authors:
Shuai Ding;Josh Attenberg;Ricardo Baeza-Yates;Torsten Suel
Affiliations:
Polytechnic Institute of NYU, Brooklyn, NY, USA;Polytechnic Institute of NYU, Brooklyn, NY, USA;Yahoo Research, Barcelona, Spain;Polytechnic Institute of NYU, Brooklyn, USA
Venue:
Proceedings of the fourth ACM international conference on Web search and data mining
Year:
2011

Citing 34
Cited 2

Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
Compressed inverted files with reduced decoding overheads

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Efficient passage ranking for document databases

ACM Transactions on Information Systems (TOIS)
Probe, count, and classify: categorizing hidden web databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Rank-preserving two-level caching for scalable search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Web caching with request reordering

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Modern Information Retrieval

Modern Information Retrieval
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Predictive caching and prefetching of query results in search engines

WWW '03 Proceedings of the 12th international conference on World Wide Web
Design and Implementation of a High-Performance Distributed Web Crawler

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
New results on web caching with request reordering

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Three-level caching for efficient query processing in large Web search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
Inverted files for text search engines

ACM Computing Surveys (CSUR)
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
The impact of caching on search engines

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Pruning policies for two-tiered inverted index with correctness guarantee

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Heavy-tailed distributions and multi-keyword queries

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Optimizing result prefetching in web search engines with segmented indices

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Optimized query execution in large search engines with global page ordering

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Performance of compressed inverted list caching in search engines

Proceedings of the 17th international conference on World Wide Web
ResIn: a combination of results caching and index pruning for high-performance web search engines
The Generalized Maximum Coverage Problem

Information Processing Letters
Challenges in building large-scale information retrieval systems: invited talk

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Top-k aggregation using intersections of ranked inputs

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Inverted index compression and query processing with optimized document ordering

Proceedings of the 18th international conference on World wide web
Improved techniques for result caching in web search engines

Proceedings of the 18th international conference on World wide web
A study of replacement algorithms for a virtual-storage computer

IBM Systems Journal
Fast query evaluation through document identifier assignment for inverted file-based information retrieval systems

Information Processing and Management: an International Journal
A refreshing perspective of search engine caching

Proceedings of the 19th international conference on World wide web
Scalable techniques for document identifier assignment in inverted indexes

Proceedings of the 19th international conference on World wide web
A large-scale active learning system for topical categorization on the web

Proceedings of the 19th international conference on World wide web
Why label when you can search?: alternatives to active learning for applying human resources to build classification models under extreme class imbalance

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

An evaluation of fault-tolerant query processing for web search engines

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Prefetching query results and its impact on search engines

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large web search engines are now processing billions of queries per day. Most of these queries are interactive in nature, requiring a response in fractions of a second. However, there are also a number of important scenarios where large batches of queries are submitted for various web mining and system optimization tasks that do not require an immediate response. Given the significant cost of executing search queries over billions of web pages, it is a natural question to ask if such batches of queries can be more efficiently executed than interactive queries. In this paper, we motivate and discuss the problem of batch query processing in search engines, identify basic mechanisms for improving the performance of such queries, and provide a preliminary experimental evaluation of the proposed techniques. Our conclusion is that significant cost reductions are possible by using specialized mechanisms for executing batch queries in Web search engines.